HK40044289A - System and method for obtaining training data - Google Patents
System and method for obtaining training data Download PDFInfo
- Publication number
- HK40044289A HK40044289A HK62021033411.1A HK62021033411A HK40044289A HK 40044289 A HK40044289 A HK 40044289A HK 62021033411 A HK62021033411 A HK 62021033411A HK 40044289 A HK40044289 A HK 40044289A
- Authority
- HK
- Hong Kong
- Prior art keywords
- classifier
- vehicle
- sensor data
- trigger
- neural network
- Prior art date
Links
Description
Cross Reference to Related Applications
This application incorporates by reference the entire contents of U.S. provisional application No. 62/731,651 entitled "NEURAL NETWORK TRAINING" filed 2018, 9, 14, which is incorporated by reference herein in its entirety.
Any and all applications filed with this application for which foreign or domestic priority requirements are identified in application data sheets are incorporated herein by reference in their entirety, according to 37CFR 1.57.
Technical Field
The present disclosure relates to systems and techniques for machine learning. More specifically, the present disclosure relates to techniques for generation of training data.
Background
Deep learning systems used for applications such as autonomous driving are developed by training machine learning models. In general, the performance of deep learning systems is limited, at least in part, by the quality of the training set used to train the model. In many instances, a significant amount of resources are expended to collect, collate (curl), and annotate training data. The effort required to create a training set can be large and often tedious. Furthermore, for certain use cases where the machine learning model needs to be improved, it is often difficult to collect data.
Drawings
The following drawings and associated descriptions are provided to illustrate embodiments of the disclosure and not to limit the scope of the claims. The aspects and many of the attendant advantages of this disclosure will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
fig. 1A is a schematic view illustrating an automobile driving on a road and detecting tires located on the road.
FIG. 1B is a block diagram illustrating one embodiment of a system for generating training data.
FIG. 2 is a flow diagram illustrating an embodiment of a process for applying trigger classifiers to intermediate results of a machine learning model.
FIG. 3 is a flow diagram illustrating an embodiment of a process for creating a trigger classifier using intermediate results of a machine learning model.
FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data and transmitting sensor data using a trigger classifier.
FIG. 5 is a flow diagram illustrating an embodiment of a process for deploying training data from data corresponding to use cases identified by a trigger classifier.
FIG. 6 is a flow chart illustrating an embodiment of a process for performing selection of a classifier and transmitting sensor data on a vehicle (vehicle).
FIG. 7 is a block diagram illustrating an embodiment of a deep learning system for identifying potential training data.
Detailed Description
One or more innovations are described herein, which may be implemented in a variety of ways, including as a process, an apparatus, a system, a composition of matter, a computer program product embodied on a computer-readable storage medium, and/or a processor (such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor). In this specification, these implementations, or any other form that the innovation may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the innovation. Unless otherwise specified, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or as a specific component that is manufactured to perform the task. As used herein, the term "processor" refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
The following detailed description of one or more embodiments of one or more innovations is provided along with the accompanying drawings that illustrate the principles of the innovations. The innovations are described in connection with such embodiments, but the innovations are not limited to any embodiment. The scope of the innovations is limited only by the claims, and the innovations include many alternatives, modifications, and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the innovation. These details are provided for the purpose of example and the innovation may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the innovation has not been described in detail so that the innovation is not unnecessarily obscured.
Introduction to
The present specification describes innovations that solve at least the following technical problems. Effective machine learning techniques rely on training data sets that are used to inform the underlying machine learning model. For example, a neural network may be trained using thousands, hundreds of thousands, millions, and so on of examples. During training, these examples may be used to adjust parameters (e.g., weights, biases, etc.) of the neural network. Additionally, these examples may be used to adjust a hyper-parameter (e.g., several layers) of a neural network. Thus, access to training data is a limitation to the use of such machine learning techniques.
As machine learning models become more complex, such as deeper neural networks, the necessity for large training data sets has correspondingly increased. These deeper neural networks may require a greater number of training examples than shallower neural networks to ensure their versatility is high. For example, while neural networks may be trained to be highly accurate with respect to training data, neural networks may not generalize well to future examples that are not seen. In this example, the neural network may benefit from additional examples included in the training data.
It should be appreciated that there may be significant technical hurdles to acquiring training data. For example, certain machine learning models may be used to classify features or objects included in an image. In this example, the machine learning model may learn to recognize a first object (e.g., a car) from a second object (e.g., a stop sign). The effectiveness of these machine learning models may be limited according to the number of instances of the feature or object. For example, an entity may wish a machine learning model to identify a bicycle riding on the street. As another example, an entity may wish for a machine learning model to identify a bicycle carried at the rear of an automobile. Absent sufficiently trained ones of these examples, the identification of the machine learning model may not be accurate enough to be useful. Often, a substantial amount of effort may be required by an entity to have people mark an image to include certain features or objects. For example, one may have to manually review an image and then assign labels to portions of the image that correspond to certain features or objects.
One embodiment is a system and method that solves this problem by quickly generating training data. In one embodiment, the training data may include examples of any desired learnable features. With respect to computer vision, training data may be generated quickly, the training data including examples of any desired objects or features in the image. These objects or features may often represent "edge cases" that are difficult to identify. For example, training data may be generated that includes images of complex scenes desired by the entity. In this example, the entity may prefer to acquire an image depicting a bicycle on the rear or front of the vehicle (e.g., a bicycle carried in the front frame of a bus).
The entities described above may utilize numerous (e.g., thousands, millions) of vehicles traveling on various roads or other navigable areas of the world. These vehicles may include or otherwise have access to a sensor (e.g., a camera). As these vehicles travel around, they can capture sensor information. For example, sensor information may be captured during normal operation of the vehicle. The sensor information may be used by the vehicle for certain autopilot features such as lane navigation. However, in one embodiment, the system includes circuitry and software that allows the vehicle to collect examples of image features or objects desired by the entity to be used as training data for the machine learning system.
For example, a classifier (e.g., a small or shallow neural network, a support vector machine, etc.) can be uploaded to at least a portion of the vehicle. The vehicle may acquire sensor information (e.g., images, video) during normal operation, and the classifier may be configured to identify particular features or objects represented in the sensor information. These classifiers may be trained to classify images as including particular image features or objects before being provided to the vehicle. For example, a limited number of examples (e.g., one hundred, one thousand, etc.) of a particular image feature or object may be used to train these classifiers. As will be described, the classifier can then use information from an intermediate layer of a machine learning model executing on the vehicle to classify the sensor data. An example machine learning model may include a convolutional network. The example machine learning model may be used, at least in part, for the autonomous driving features described above. Thus, the classifier can utilize existing example machine learning models.
Many of these classifiers can be uploaded to a computer system within the vehicle such that the classifier can be used to identify a particular image feature or object associated with the classifier. The captured images designated by the classifier as including a particular feature or object may then be transmitted to a central server system and used as training data for the neural network system. The classifier may be effective in processing requirements because it may utilize existing machine learning models that have been executed by the vehicle in typical operation. Additionally, there may be a large number of vehicles traveling in different environments, which increases the likelihood of obtaining an example of a "marginal situation" where it is difficult to find certain features. In this manner, an entity may quickly acquire sensor information (e.g., images) that represent particular image features or objects of interest to the entity.
In this specification, an object or feature to be learned may represent any real-world object, scene, feature, etc. that may be captured in sensor data. Example objects or features may include tires in the road, tunnel exits, bicycles, trees with branches protruding into the road, scenes in which the vehicle performs a particular action or maneuver or is oriented in a particular manner, and so forth. Further, reference is made herein to identifying training data for use cases or purposes. An example use case or purpose may include identifying one or more objects, features, etc. Additionally, although the present description describes a vehicle acquiring sensor information, such as an image, it is to be understood that the features described herein may be broadly applicable. For example, a classifier may be provided to a user device (e.g., a smartphone) and used to identify particular image features or objects. As another example, the classifier may be used for airplanes, drones, unmanned vehicles, and the like.
Generation of training data
A neural network training technique for identifying additional training data related to a particular use case is disclosed. By identifying and collecting additional training data, particularly data that is difficult to analyze correctly for use cases, deep learning systems can be retrained to improve their performance. For example, difficult use cases may be identified, and data may be collected based on the use cases. The newly collected data can then be used to train a new machine learning model that outperforms the old model. In various embodiments, existing machine learning models are utilized with trigger classifiers to identify relevant training data. The relevant training data is then passed back to processing to create new training data. In some embodiments, an initial dataset representing a target use case is created and used to create a trigger classifier.
For example, deep learning systems for autonomous driving may have difficulty analyzing and identifying tunnel exits. A training data set is created using the positive and negative examples of tunnel exits. In some embodiments, the trigger classifier is trained on an initial training data set using intermediate outputs of layers of an existing machine learning model. In some embodiments, the layer is an intermediate layer. For example, data from a training set is fed into an existing machine learning model, and the output of the second-to-last layer of the model is used as input to train the trigger classifier. In some embodiments, the trigger classifier is a support vector machine trained offline from deployed deep learning applications. Once trained, the trigger classifier can be installed or deployed to operate with a deep learning system that has been used in the autonomous driving system of the vehicle. For example, the trigger classifier may be deployed over a wireless network that downloads and installs the trigger classifier in the vehicle. The trigger classifier is applied to intermediate outputs of the same layer of the deployed deep learning system to determine a classifier score. In some embodiments, the input to the trigger classifier is an intermediate output of a layer of a Convolutional Neural Network (CNN) that is applied to sensor data captured by the autonomous vehicle, such as image data captured by a camera on the vehicle.
In some embodiments, a trigger classifier implemented using a single support vector machine, a small neural network, or another suitable classifier may be applied to the entire captured image and/or to a particular location of the image. For example, the trigger classifier may be applied to each individual location or subset of locations of the image. Trigger classifiers can be applied to effectively scan features across a neural network in space to identify small components, such as shopping carts, animals, and the like. Once applied, the trigger classifier determines a classifier score, and depending on the score, the sensor data is identified and retained as potentially useful training data. As one example, the trigger classifier scores sensor data from the cameras based on the likelihood that the data represents a tunnel exit. Sensor data that scores high and is likely to represent the tunnel exit is retained and flagged to be used as training data. In some embodiments, a triggering attribute, such as a filter, is applied to a triggering classifier to determine conditions that must be met in continuing to determine the classifier score, the case where the classifier score exceeds a threshold, and/or conditions necessary to preserve sensor data. For example, in some embodiments, sensor data is scored and collected at most once per interval, such as no more than once every 30 minutes. In some embodiments, the classifier score must exceed a threshold that is collected and retained for sensor data. If the sensor data meets the configured threshold, the sensor data is retained and used as potentially new training data. In one embodiment, the sensor data is uploaded wirelessly to a server that is managing the training data system.
In some embodiments, additional metadata is collected and retained with the sensor data, such as location, road type, vehicle model, whether the vehicle is driving left or right, time of day, classifier score, length of time since last transmission of sensor data, and/or vehicle control parameters/operating conditions (such as speed, acceleration, steering, braking, steering angle, etc.). In various embodiments, the data and metadata are transmitted to a computer data server where they are used to create a new training data set to improve the application of the deep learning system for a particular use case. For example, retained sensor data associated with the identified tunnel exit is identified by the trigger classifier and used to create additional training data for identifying the tunnel exit.
In some embodiments, after uploading, the sensor data is reviewed and annotated to create a new training data set that is used to improve the autonomous driving characteristics of the vehicle. For example, the data may be annotated as a positive sample of the tunnel exit and may be used to supplement the original training data set that includes many more use cases. The new machine learning model is trained using the most recently compiled data set to improve the autonomous vehicle neural network, and then deployed to the vehicle as an update to the autonomous vehicle system. The newly deployed machine learning model has improved capabilities to detect specific use cases (e.g., tunnel exits) targeted by the triggered classifier. As one example, the improved model will have improved accuracy and performance in identifying tunnel exits. Additional examples of use cases include trigger classifiers trained to identify particular objects (e.g., shopping carts, animals, etc.), road conditions, weather, driving patterns, hazards, etc.
In various embodiments, trigger classifiers can be developed and deployed to a fleet of vehicles without updating core software for the vehicles (such as components of a deep learning system used for autonomous driving). New and updated trigger classifiers linked to and associated with existing neural network software of a vehicle can be pushed to the vehicle more frequently with little to no impact on core vehicle functions, such as driving, safety systems, and navigation, among others. For example, the trigger classifier may be trained to identify a cobblestone road, be deployed to a fleet of vehicles, and begin to aggregate images and related data of the cobblestone road within minutes. Using the disclosed techniques, the speed of gathering relevant training data for a particular use case is greatly increased with little to no impact on ongoing vehicle operations or the driver or passengers of the vehicle. A new trigger classifier can be deployed without the need for a lengthy and laborious installation process. The process may be performed remotely and dynamically, for example, using over-the-air updates, without the need to bring the vehicle to a service location. After such an update, the trigger classifier may begin scanning the captured images for any images that satisfy the trigger conditions and then upload images that satisfy those conditions as future training data objects.
In some embodiments, the sensor data is transmitted and received by different devices. For example, a vehicle equipped with autonomous driving technology includes sensors for gathering information related to its surroundings, the vehicle receiving sensor data from its sensors. In some embodiments, the vehicle is equipped with sensors such as cameras, ultrasonic sensors, radar sensors, LiDAR and/or other suitable sensors to capture data related to autonomous driving. In some embodiments, a neural network is applied to the sensor data. For example, a Convolutional Neural Network (CNN) is applied to received sensor data, such as an image of a road ahead of a vehicle. The CNN may be used to identify objects in the captured sensor data, and the results of applying the neural network are used to control the vehicle. As an example, lane lines are identified and used to hold vehicles between the identified lane lines.
In some embodiments, a trigger classifier is applied to the intermediate output of the neural network to determine a classifier score for the sensor data. For example, the intermediate output of the layer is fed to a trigger classifier, which determines a classifier score for the sensor data. In some embodiments, the neural network includes multiple intermediate layers, and particular intermediate outputs (and corresponding layers) from which inputs are received into the trigger classifier are configurable. For example, the trigger classifier may be configured to receive outputs of the second to last, third to last, fourth to last, etc. layers. In some embodiments, the intermediate output is an output from any of the intermediate layers of the neural network. In some embodiments, the intermediate output may be an output of a first layer of the neural network. In some embodiments, such a determination is made based at least in part on the classifier score: whether at least a portion of the sensor data is to be transmitted via a computer network. For example, the determination is made based on whether the classifier score exceeds a threshold required to retain the sensor data and transmit the data for further use. In some embodiments, the determination is made based on the classifier score and whether additional trigger classifier conditions are satisfied. Examples of desired conditions may be used to filter the captured sensor data based on the location of the vehicle, the amount of time the vehicle has traveled, the type of vehicle, whether the autonomous driving feature has recently been disengaged, and so forth. In various embodiments, sensor data meeting the desired conditions and score threshold is transmitted to a computer server via a computer network, such as WiFi or cellular network, for further processing. In some embodiments, the data is processed to create a new or additional training data set. In various embodiments, the training data includes training and validation data.
Example block diagrams
Fig. 1A illustrates a schematic diagram of a vehicle traveling along a road and gathering training data from its surroundings. In the example block diagram, the vehicle 102 is traveling along a road. The vehicle 102 may include sensors, such as cameras, radars, etc., such that the sensors capture information about the sensor capacity (volume)104 of the environment surrounding the vehicle 102. An example sensor 107 is illustrated in fig. 1A. For example, the vehicle 102 may acquire images of the surroundings of the vehicle 102. These acquired images may then be analyzed in an effort to understand the surrounding environment. For example, the image may be analyzed to classify objects represented in the image. In this example, the image may be analyzed to identify other vehicles, road markings, trees or other vegetation, obstacles in the road, pedestrians, signs, and so forth. As will be described in more detail below, the vehicle 102 can utilize machine learning techniques to analyze the sensor information. For example, one or more convolutional neural networks may be used to classify objects included in the example sensor capacity 104. An example description of a deep learning system 700 that can be used by the vehicle 102 is included below with reference to fig. 1B and 7.
While the above-described machine learning techniques may be used to analyze sensor information, it should be appreciated that certain real-world objects or scenes may be difficult for the vehicle 102 to accurately understand or classify. For example, the tires 106 are shown as being located on the road on which the vehicle 102 is traveling. Being able to identify the tire 106 may enhance the safety and performance of the vehicle 102. As an example, if the tire 106 is in the path of the vehicle 102, the vehicle 102 may perform an autonomous driving technique to navigate around the tire 106. Additionally, even if the tire is not in the path of the vehicle 102, identifying the tire 102 may still affect autonomous driving of the vehicle 102. For example, other vehicles may suddenly turn the lane of the vehicle 102 to avoid the tire 106. In this example, it can be identified that the tire 106 can therefore inform the vehicle 102 of future predicted movement (e.g., preemptively slow down when another vehicle approaches the tire 106).
Accordingly, it may be beneficial for the vehicle 102 to accurately identify the tires 106 as being included in the sensor capacity 104. However, as described above, being able to identify tires 106 may require a significant amount of training data. The training data may include images of numerous tires in all configurations on various roads. The training data may be enhanced by including images of different tires on different roads. Additionally, the training data may be enhanced by images of different tires on different roads in different driving environments. For example, it may be advantageous to have an image of a tire partially included in snow on a different road. As another example, it may be advantageous to have an image of a deflated tire included in a road where dust flies. Gaining access to such images can present significant technical challenges.
As will be described, one or more classifiers may be trained to identify tires. For example, a classifier may be trained using a limited set of training examples. These classifiers can then be provided to the vehicle 102 via over-the-air (OTA) updates. For example, OTA updates can be received wirelessly via the vehicle 102 (e.g., over Wi-Fi, via a cellular signal such as an LTE network, etc.). The classifier can then analyze the sensor information acquired by the vehicle 102. If the classifier detects that a tire is depicted in the sensor information (e.g., image), the vehicle 102 can transmit the sensor information to an external system for processing. The external system may aggregate such received sensor information to create a training data set for the tire. As will be described, these training data sets may then be used to train complex machine learning models (e.g., convolutional neural networks) executing on the vehicle 102. In this manner, the machine learning model, and thus the ability of the vehicle 102 to perform autonomous driving tasks, may be enhanced.
FIG. 1B illustrates a block diagram of the generation of training data. In this illustration, the vehicle 102 is receiving sensor data 108. The sensor data 108 may include one or more images or videos depicting the tires 106 illustrated in fig. 1. The sensor data 108 can be provided to a deep learning system 700 in one or more processors included in the vehicle 102. An example of aspects of the deep learning system 700 is illustrated in fig. 1B.
As illustrated, the deep learning system 700 may analyze the received sensor data 108 using example machine learning techniques, such as convolutional neural networks. As depicted in fig. 2, the sensor data 108 may be pre-processed (e.g., normalized, passed through a filter, etc.). It should be understood that a convolutional neural network may include numerous convolutional layers. These convolutional layers may apply convolutional filters so that output capacity is created. In some embodiments, one or more fully connected or dense layers may be used as a final layer to classify features or objects included in the sensor data 108. As an example, one or more softmax layers or independent logical classifiers may be used to classify features or objects. In this manner, the deep learning system 700 can identify real-world objects, scenes, etc. included in the sensor capacity 104 around the vehicle 102. Based on identifying these real-world objects, scenes, etc., the vehicle 102 can perform autonomous driving tasks. Thus, the vehicle 102 may implement a convolutional neural network in its typical operation.
The deep learning system 700 includes one or more classifiers. For example, classifiers A-N110A-110N are illustrated in FIG. 1B. These classifiers 110A-110N may have been received via OTA updates to the vehicle 102 (e.g., periodic updates provided to the vehicle). Prior to receiving the classifiers 110A-110N, the entities may have trained them to identify respective features or objects represented in the sensor data. For example, classifier a110A may have been trained to identify snowing scenes. As another example, the classifier N110N may have been trained to identify bicycles, tires, etc. on a road. An entity may have trained classifiers 110A-110N using limited training data. For example, classifier N110N may have been trained using 100, 500, 1000 examples of tires on a road or a particular type of tire on a particular type of road.
As illustrated, the classifiers 110A-110N can use information obtained from an intermediate layer of an example machine learning model (e.g., a convolutional neural network). For example, the features 112 may be obtained from an intermediate layer of a convolutional neural network. The classifier can take advantage of this existing capability, as the convolutional neural network can be trained to classify or otherwise identify features or objects in the sensor data. As an example, a convolutional neural network may learn to apply a convolutional filter to learn features indicative of real-world objects. The convolutional neural network may then classify the features into a particular class or class corresponding to the real-world object.
When training the classifiers 110A-110N, they can thus be trained using information obtained from the middle layers of the convolutional neural network. For example, the classifier 110N may be trained using a limited training data set of images depicting tires. In this example, the image may be provided to an example convolutional neural network. At a particular intermediate layer of the convolutional neural network, the features 112 may be provided to the classifier 110N. The classifier 110N may then be trained to assign a high classifier score to the image depicting the tire. The classifier 110N may optionally be trained to assign a low classifier score to images that do not depict tires. In this manner, the classifier 110N may utilize a convolutional neural network, which, as described above, may be used in typical operation of the vehicle 102.
As illustrated in FIG. 1B, the classifiers 110A-110N are receiving features 112 from an intermediate layer of a convolutional neural network. Alternatively, the classifiers 110A-110N may use features from different intermediate layers. For example, classifier N110N may use features from a first layer (e.g., layers 4, 5, etc.), while classifier a 110N may use features from a second layer (e.g., layers 6, 7, etc.). During training, a particular layer from which features are received may be identified for each classifier. For example, a particular layer may be identified based on the accuracy of the corresponding trained classifier with respect to the validation data set.
With respect to the tire 106 illustrated in FIG. 1A, one of the classifiers 110A-110N may be trained to identify a tire. For example, classifier N110N may be trained to identify tires. In this example, classifier N110N may assign a classifier score to sensor data 108. In the illustrated example, the classifier N110N has been assigned a classifier score greater than a threshold (e.g., 0.5, 0.7, etc.). The vehicle 102 can thus transmit the sensor data 108 to an external system (e.g., the training data generation system 120). For example, the vehicle may transmit the sensor data 108 over a network (e.g., the internet) via Wi-Fi, cellular service, or the like.
Thus, the external system 120 may receive sensor data 108 from a multitude of vehicles. For example, the external system 120 may receive images depicting tires from vehicles that may pass in the immediate vicinity of the tires during their normal operation. Advantageously, these tires may be of different types, may be deflated or in a state of attenuation, may be indicated under different road conditions, may be partially blocked, and so on. As an example, the classifiers 110A-110N may use classifier scores that result in the transfer of the multitude of sensor data 108 to the external system 120. For example, a portion of the image transmitted to the system 120 may not include a tire. In some embodiments, the entity may therefore quickly review and discard certain images. The remaining images may be aggregated into a large training dataset and used to update a machine learning model executing on the vehicle. For example, a convolutional neural network may be trained to identify tires. Alternatively, bounding boxes or other label information may be assigned to the images included in the aggregated training dataset.
In some embodiments, the vehicle 102 may have more classifiers than the number currently being executed by the vehicle 102. For example, the vehicle 102 may have 50, 75, 100 classifiers. However, during operation of the vehicle 102, the vehicle 102 may execute 20, 30, 40, or more classifiers. For example, the vehicle 102 can determine a respective classifier score for a subset of all classifiers stored by the vehicle 102. Alternatively, each classifier may be executed for a particular period of time before being swapped for another classifier.
Additionally, the vehicle 102 can execute certain classifiers depending on one or more triggers. As an example, the vehicle 102 can receive information identifying a location or an approximate location that is known to have certain real-world objects, features, or exhibit certain scenes. For example, the vehicle 102 can access map information that identifies tunnel exits in a particular area. In this example, the vehicle 102 may ensure that: as the vehicle 102 approaches the tunnel exit, the classifier associated with identifying the tunnel exit is executing.
As another example, the external system 120 may optionally receive location information along with the received sensor data. Thus, the external system 120 may identify a threshold number of vehicles having transmitted sensor data based on the same classifier for a particular real world region. As an example, the external system 120 may identify that a particular on-ramp has an obstacle on the road. As another example, the external system 120 may identify that a particular on-ramp has some type of obstruction on the road. The system 120 can then communicate information to a portion of the vehicle to perform the same classifier when approaching a particular real world region. In this manner, the system 120 may ensure that a greater amount of training data can be acquired based on the same sensor.
Further, the system 120 can direct the vehicle to transmit the sensor data even if the classifier described above does not assign a classifier score greater than a threshold. As an example, the system 120 may receive sensor data from a threshold number of vehicles in proximity to a real-world location. In this example, the system 120 may direct any vehicles within a threshold distance of the real-world location to transmit sensor data (e.g., images) even if their classifiers did not generate a classifier score greater than a threshold. Since the classifier may train the training set using a limited number of examples (e.g., 100, 1000, as described above), depending on the angle of the particular vehicle relative to the object, the classifier of the particular vehicle may not be able to identify the object. However, sensor data may be useful for generating robust training sets for a subject. For example, an object may be partially visible in an image acquired by a particular vehicle and therefore may be used to identify the object in a large training set. In this manner, the external system 120 may override the classifier and cause the particular vehicle to transmit the sensor data.
In all cases where the external system 120 receives location information or any identifying information, it is understood that the information may be anonymized. Additionally, such techniques may require affirmative consent (e.g., opt-in) by the user.
Example flow diagrams
FIG. 2 is a flow diagram illustrating an embodiment of a process for applying trigger classifiers to intermediate results of a machine learning model. In some embodiments, the process of fig. 2 is utilized to collect and retain sensor data that satisfies a particular use case and is captured by sensors for use in a machine learning model for autonomous driving. For example, a particular use case may be associated with the identification of certain features, objects, scenes, and the like. In some embodiments, the process of fig. 2 is implemented on an autonomous-driving-enabled vehicle, regardless of whether autonomous driving control is enabled. For example, sensor data may be collected at the following times: immediately after the autonomous driving is disengaged or while the vehicle is being driven by a human driver. In some embodiments, the techniques described in fig. 2 may be applied to other deep learning systems outside the context of autonomous driving to improve training data sets, particularly for cases where it is difficult to analyze use cases. In various embodiments, the trigger classifier has been trained using training data designed for use cases and intermediate outputs of layers of machine learning.
In some embodiments, multiple triggers and/or multiple classifiers may be used together to identify sensor data for multiple use cases. For example, one trigger may be used to identify a tunnel, another for a manhole, another for a road bifurcation, etc. In some embodiments, the functional components of the trigger classifier used to determine the classifier score and/or apply the required condition are shared between different triggers. In some embodiments, each trigger is specified using a weighting vector, an optional deviation, and one or more threshold metrics to compare to the classifier score. In some embodiments, additional required conditions such as time of day, vehicle location, road type, etc. are specified for a particular trigger. For example, triggering may require capturing sensor data of the tunnel only at dawn and dusk. As another example and useful for reducing duplicate data, triggering may require sensor data to be captured at most every 30 minutes and only after the vehicle has traveled at least 20 minutes. In various embodiments, the trigger threshold(s) and the desired condition are attributes specified for the trigger classifier.
At 201, sensor data is received. For example, a vehicle equipped with sensors captures sensor data and provides the sensor data to a neural network running on the vehicle. In some embodiments, the sensor data may be visual data, ultrasound data, LiDAR data, or other suitable sensor data. For example, an image is captured from a high dynamic range front camera. As another example, the ultrasound data is captured from a side ultrasound sensor. In some embodiments, the vehicle is fixed with a plurality of sensors for capturing data. For example, in some embodiments, eight surround cameras are fixed to the vehicle and provide 360 degree visibility around the vehicle within a range of up to 250 meters. In some embodiments, the camera sensor includes a wide angle front camera, a narrow angle front camera, a rear view camera, a front view side camera, and/or a rear view side camera. In some embodiments, ultrasonic and/or radar sensors are used to capture surrounding details. For example, twelve ultrasonic sensors may be fixed to a vehicle to detect both hard and soft objects. In some embodiments, a front radar is utilized to capture data of the surrounding environment. In various embodiments, the radar sensor is able to capture details of the surroundings despite heavy rain, fog, dust, and other vehicles. Various sensors are used to capture the environment surrounding the vehicle, and the captured images are provided for deep learning analysis.
At 203, the sensor data is preprocessed. In some embodiments, one or more pre-processing passes (pass) may be performed on the sensor data. For example, the data may be preprocessed to remove noise, correct alignment issues and/or blurring, and so forth. In some embodiments, one or more different filtering passes are performed on the data. For example, high pass filtering may be performed on the data, and low pass filtering may be performed on the data to separate out different components of the sensor data. In various embodiments, the preprocessing step performed at 203 is optional and/or may be incorporated into a neural network.
At 205, deep learning analysis of the sensor data is initiated. In some embodiments, deep learning analysis is performed on the sensor data, optionally preprocessed at 203. In various embodiments, deep learning analysis is performed using a neural network, such as a Convolutional Neural Network (CNN). In various embodiments, the machine learning model is trained offline and installed on the vehicle to perform inferences on the sensor data. For example, the model may be suitably trained to identify roadway lane lines, obstacles, pedestrians, moving vehicles, parked vehicles, drivable spaces, and the like. In various embodiments, the neural network includes multiple layers including one or more intermediate layers.
At 207, potential training data is identified. For example, sensor data that can be used to train a machine learning model is identified from sensor data analyzed using deep learning analysis. In some embodiments, the identified training data is data associated with a particular use case. For example, a possible use case may involve identifying: curved roads, entrance ramps, exit ramps, entrances to tunnels, exits to tunnels, obstacles on roads, bifurcations on roads, road lane lines or markings, drivable spaces, road signs, sign content (e.g., words, numbers, symbols, etc.), and/or other features suitable for autonomous driving. In various embodiments, use cases depicted in sensor data are identified by using intermediate outputs of layers of a neural network for deep learning analysis and trigger classifiers. For example, the trigger classifier uses the output of the middle layer of the neural network to determine a classifier score. Classifier scores that pass the required conditions specified with the trigger and exceed a threshold are identified as potential training data. In various embodiments, the threshold is utilized to identify a positive example of a use case. For example, a higher categorized score indicates a higher likelihood that the sensor data is a representative use case. In some embodiments, the classifier score is a number between negative one and positive one. A score closer to a positive one is more likely to be a representative target use case. In various embodiments, conditions specified by additional filters (such as time of day, vehicle type, location) are used to identify sensor data for the target use case.
At 209, the identified sensor data is transmitted. For example, the identified sensor data is transmitted to a computer server for additional processing at 207. In some embodiments, the additional processing includes creating a training set using the identified sensor data. In various embodiments, the sensor data is wirelessly transmitted from the vehicle to the data center, for example, via a WiFi or cellular connection. In some embodiments, metadata is transmitted with the sensor data. For example, the metadata may include classifier scores, time of day, timestamps, locations, vehicle types, vehicle controls, and/or operating parameters (such as speed, acceleration, braking, whether autonomous driving is enabled, steering angle, etc.). Additional metadata includes time since last transmission of previous sensor data, vehicle type, weather conditions, road conditions, etc.
At 211, post-processing of the data is performed. In some embodiments, different post-processing techniques are utilized to enhance quality and/or reduce the amount of data required to represent the data. In some embodiments, the output of the deep learning analysis is merged with the results of the deep learning applied to other sensors. In some embodiments, post-processing is used to smooth the analysis performed on the different sensor data. The processed data may be used to control a vehicle. Additional information related to the data may also be processed at 211. For example, information such as the configuration of the autonomous driving system (including which autonomous driving features are enabled) may be combined with deep learning analysis. Other information may include vehicle operation and/or control parameters and/or environmental data (such as maps, terrain, and/or GPS data). In some embodiments, post-processing may include combining the results of deep learning analysis performed on data from other sensors to create a unified representation of the vehicle surroundings. In some embodiments, the post-processing step at 211 is an optional step.
At 213, the results of the deep learning analysis are provided to vehicle controls. For example, the results are used by a vehicle control module to control the vehicle for autonomous driving. In some embodiments, vehicle control may adjust the speed and/or steering of the vehicle. In various embodiments, vehicle control may be disabled, but intermediate results of the deep learning analysis at 205 are utilized to identify training data at 207 and transmit the identified sensor data at 209. In this manner, deep learning analysis may be utilized to identify and retain suitable training data even when the vehicle is not under the control of an autonomous driving system. In various embodiments, sensor data is identified and retained when the autonomous driving system is active.
FIG. 3 is a flow diagram illustrating an embodiment of a process for creating a trigger classifier using intermediate results of a machine learning model. In some embodiments, the process of fig. 3 is utilized to train a trigger classifier to identify and retain relevant sensor data for a particular use case. For example, sensor data processed by a deep learning system during its regular use includes a subset of data that can be used as training data. Using intermediate results of the deep learning system for autonomous driving, the trigger classifier may be trained to identify use cases such as tunnel entrances, tunnel exits, road bifurcations, curved roads, entrance ramps, and other suitable features useful for autonomous driving. By utilizing intermediate results of a deep learning system with triggered classifiers, the efficiency of identification and collection can be greatly improved. In various embodiments, trained trigger classifiers are installed on deployed deep learning systems along with trigger attributes to collect and retain potential training data for relevant use cases. In some embodiments, the trigger classifier is a support vector machine, although other suitable classifiers may be used. For example, in some embodiments, the trigger classifier is a neural network and may include one or more intermediate layers. In some embodiments, the deployed deep learning system utilizes the process of fig. 2.
At 301, training data is prepared. For example, positive and negative examples of a particular use case are prepared as training data. As one example, positive and negative examples of tunnel exits are collected and annotated. The sorted and annotated data set is used to create a training set. In some embodiments, annotating includes marking the data and can be performed by a human collator. In some embodiments, the format of the data is compatible with a machine learning model used on the deployed deep learning application. In various embodiments, the training data includes validation data for testing the accuracy of the trained model.
At 303, deep learning analysis is applied to the training data. For example, existing machine learning models are used to initiate the deep learning process. In some embodiments, the deep learning model is a neural network, such as a Convolutional Neural Network (CNN) with multiple layers. In some embodiments, the CNN may include three or more intermediate layers. Examples of deep learning analysis include neural networks for autonomous driving. In various embodiments, the deep learning analysis is initiated by feeding the training data prepared at 301 to a neural network to produce middle tier results.
At 305, the trigger classifier is trained. In some embodiments, the trigger classifier is a support vector machine or a small neural network. In various embodiments, the input to the trigger classifier is the output of the first or middle tier of the machine learning model of the deep learning system. The particular layers used for input may be configurable. For example, the output of the next-to-last layer, etc., up to the first layer, may be utilized as input to train the trigger classifier. In various embodiments, the annotation results of the training data are used along with the raw data (such as image data) to train the trigger classifier. Using positive and negative examples, the trigger classifier is trained to identify the likelihood that an input (e.g., an input from sensor data) is a match for a particular use case, such as a tunnel exit. In some embodiments, the results of the trained trigger classifier are validated using the validation dataset created at 301. In some embodiments, the trigger classifier is trained using an offline neural network that matches a neural network deployed on the vehicle.
In some embodiments, the output of the neural network output is a feature vector that identifies features of the input data (such as the raw image). The features may include the number of vehicles, the number of signs, the number of lanes, etc. in the raw data. The intermediate output of a layer (e.g., a layer processed before the final layer) includes semantic information of the original input data. In some embodiments, the intermediate output of a layer may be represented in vector form, and the vector has more elements than the vector output of the final layer. For example, the final output of the neural network may be a 32 element vector, while the output of the second to last layer may be a 64 element vector. In various embodiments, the outputs of the first and middle layers of the neural network (such as, for example, 64-element vectors) include a greater amount of semantic information associated with the raw input data than the outputs of the final layer of the neural network (such as, for example, 32-element vectors), and are therefore used to train the trigger classifier. In some embodiments, the particular layer selected for training the trigger classifier may be dynamically selected. For example, a particular intermediate layer (such as an earlier layer) may be selected based on an increase in accuracy of the particular layer as compared to another layer (such as a layer closer to the final layer). In some embodiments, a particular layer is selected based on the efficiency with which the layer is utilized. For example, if the result of using a layer with a smaller output vector meets the accuracy requirement, that layer may be selected.
In some embodiments, inputs from different intermediate layers are used to train more than one trigger classifier, and different trained classifiers are compared to each other. A balance between accuracy and performance is used to determine which of a plurality of classifiers to use. For example, for some use cases, the output of an earlier middle layer is necessary, while for other use cases, the output of a later middle layer is sufficient. The output of the best intermediate layer can be determined by comparing a plurality of trained trigger classifiers. In various embodiments, the selection of which layer of the neural network to receive the intermediate results is dynamically made as part of a trigger classifier training process.
In some embodiments, the trained classifier may be specified by a vector and a bias factor. For example, the trained classifier may be a vector of weights that are biased by a bias factor to determine a classifier score. In some embodiments, the number of elements of the vector is the same as the number of elements of the output of the middle layer used and the number of elements of the input used to train the classifier. For example, where the output of the middle tier used to train the classifier is 1024 elements, the input data used to train the trigger classifier is 1024 elements and the resulting trigger classifier can be represented as 1024 weighted vectors and biases. In some embodiments, the bias is optional and may be considered by the elements of the weighting vector.
At 307, trigger attributes for the classifier trained at 305 are determined. For example, a threshold may be determined that is compared to a classifier score determined by a trained trigger classifier. For example, a classifier score that exceeds a threshold indicates that the raw input associated with the score is likely to be a positive example of a target use case. For example, a trigger classifier trained to identify tunnel exits determines a classifier score. A classifier score of 0.7 using a threshold of 0.5 indicates that the data is likely to represent a tunnel exit. In some embodiments, a score of-1.0 is a negative example, and a score of 1.0 is a positive example. The classifier score lies between-1.0 and 1.0 to indicate the likelihood that the original input is a positive or negative example of the target use case.
In some embodiments, the trigger attributes include a desired condition, such as a trigger filter. The trigger filter is a filter used to limit retention of sensor data to the described conditions. For example, sensor data may be triggered for retention based on a location associated with the data. Other examples include the length of time and positive identification since the last time the sensor data was triggered, the length of time since the driver was started, the time of day, location, road type, etc. In various embodiments, one or more trigger attributes may be specified to limit the conditions under which the trigger classifier is used to collect and retain sensor data.
At 309, trigger classifiers and trigger attributes are deployed. For example, trigger classifiers and attributes used to trigger classifiers to preserve sensor data are installed with deep learning systems. For example, the trigger classifier and attributes may be packaged into a small binary that is wirelessly transmitted to the vehicle. In some embodiments, the packaged trigger classifier and attributes are transmitted as over-the-air updates using a wireless technology such as WiFi or cellular network connectivity. Once received on the vehicle, the trigger classifier and attributes will be installed as part of the autonomous driving system. In some embodiments, only the trigger classifier is installed. In some embodiments, the trigger classifier and the deep learning model for autonomous driving are installed together. In various embodiments, the machine learning model of the autonomous driving system matches the one used to train the trigger classifier.
FIG. 4 is a flow diagram illustrating an embodiment of a process for identifying potential training data using a trigger classifier. In some embodiments, the trigger classifier operates in conjunction with a deep learning system. For example, a deep learning system that uses machine learning models that match the models used to train the trigger classifier is utilized with the trigger classifier as part of an autonomous driving system. The trigger classifier analyzes the sensor data that is at least partially analyzed by the deep learning system to identify whether the sensor data satisfies a particular use case warranting a warranty retention of the sensor data. The sensor data is then transmitted to a computer server and can be used to create training data for an improved machine learning model having improved performance in identifying particular use cases. Examples of use cases include identifying an entrance ramp, a tunnel exit, an obstacle on a road, a bifurcation on a road, a particular type of vehicle, and so forth. In some embodiments, the trigger parameter is used to configure a condition under which the trigger classifier identifies relevant results. In some embodiments, one or more trigger classifiers and parameters are used to identify one or more different use cases. In some embodiments, the process of fig. 4 is performed at 205, 207, 209, 211, and/or 213 of fig. 2. In some embodiments, the process of fig. 3 is used to train the trigger classifier used in the process of fig. 4.
At 401, deep learning analysis is initiated. For example, deep learning analysis of an autonomous driving system is initiated using sensor data captured by sensors attached to a vehicle. In some embodiments, the initiated deep learning analysis includes preprocessing of the sensor data. In various embodiments, the deep learning analysis utilizes a trained machine learning model having multiple layers including one or more intermediate layers. In some embodiments, the output of the first layer and any intermediate layers is considered an intermediate output. In various embodiments, the intermediate output is an output of a layer of the machine-learned model other than the final output (e.g., an output of a final layer of the model).
At 403, reasoning for one layer using deep learning analysis is complete. For example, a neural network includes multiple layers including an intermediate layer followed by a final layer. The output of each layer (e.g., the intermediate results) is fed as input to the next layer. In some embodiments, the output of the first layer and each intermediate layer is considered an intermediate result. In various embodiments, the result of determining the output of a single layer is a vector, which may be used as an input for the next layer. In some embodiments, the input to the first layer of the neural network is sensor data, such as image data. In some embodiments, the neural network is a convolutional neural network.
At 405, the following determination is made: the output of the layer analysis performed at 403 is the result of the final layer of the neural network. In the event that the output is not the result of the final layer, e.g., the output is an intermediate result, then processing continues to 409. In the case where the output is the result of the final layer of the neural network, then the inference performed using the machine learning model is complete and the process continues to 407. In some embodiments, the output at 405 provided to 407 is a feature vector.
At 407, the results of performing the deep learning analysis on the sensor data are provided to the vehicle control. In some embodiments, the results are post-processed. For example, results of one or more different neural networks for inputs from one or more different sensors may be combined. In some embodiments, a vehicle control module is used to control operation of a vehicle to achieve vehicle control. For example, vehicle controls can modify the speed, steering, acceleration, braking, etc. of the vehicle for autonomous driving. In some embodiments, the vehicle control may enable or disable turn signals, brake lights, headlights, and/or other controls/signals that operate the vehicle, including network controls such as sending network messages via a wireless network, such as WiFi or a cellular network. In various embodiments, vehicle control may not be enabled to actively control the vehicle, for example, when the autonomous driving feature is disabled. For example, deep learning analysis at 401 and 403 is performed to provide the results as input to a trigger classifier so that potential training data can be identified even when the autonomous driving system is not actively controlling the vehicle.
At 409, the following determination is made: whether the layers of the neural network and the trigger conditions are appropriate for applying the trigger classifier. For example, the trigger attribute indicates the conditions required for applying the trigger classifier. Examples of conditions include whether the length of time since the last capture has exceeded a minimum amount of time, whether a minimum length of travel time has elapsed, whether the time of day is within a certain range, and so forth. Examples of different times of day may include dawn, dusk, day, night, etc. Additional condition requirements may be based on location, weather, road conditions, road type, vehicle type, disengagement of autonomous driving features, steering angle (e.g., exceeding a steering angle threshold), change in acceleration, activation of brakes, or other suitable features. Examples of different weather conditions may include snow, hail, sleet, rain, heavy rain, cloudy days, sunny days, cloudy days, foggy, etc. Different conditions may be specified by the trigger attributes. In some embodiments, different use cases may utilize different trigger attributes and intermediate results of different layers of the neural network. For example, some use cases may be more efficient and produce high quality results using intermediate results of later layers of the neural network. Other use cases may require earlier intermediate results in order to identify useful examples of sensor data that satisfy the use case. In some cases, a plurality of condition checking AND/OR logical operators, such as AND OR operators, may be used to nest the trigger attributes used to specify the conditions to apply the trigger classifier.
At 411, a trigger classifier score is determined. For example, the trigger classifier score is determined by applying the trigger classifier to the intermediate results of the neural network. In some embodiments, the application of the trigger classifier utilizes a weighting vector and optionally a bias to determine a classifier score associated with the sensor data. In some embodiments, the trigger classifier is a support vector machine or a neural network. In some embodiments, the performance of the trigger classifier is improved by running the classifier on a custom Artificial Intelligence (AI) processor. For example, the AI processor may perform a dot product operation on two vectors in a very few cycles and/or perform multiple dot products with limited wasted cycles. In some embodiments, the determined classifier score is a floating point number that represents the likelihood that the sensor data is a positive (or negative) example of the target use case. For example, a particular range (such as between-1 and + 1) may be used to represent the likelihood that the sensor data is a negative or positive example of the target use case.
At 413, the following determination is made: whether the classifier score exceeds a threshold and whether the required trigger condition is met. For example, in some embodiments, the classifier score is compared to a threshold. In the event that the classifier score exceeds the threshold, then processing continues to 415. In the event that the classifier score does not exceed the threshold, then processing continues to 403. In some embodiments, additional trigger-required conditions may be applied after the classifier score is determined. For example, the determined classifier score may be compared to previously determined classifier scores within a certain time window. As another example, the determined classifier score may be compared to previously determined scores from the same location. As another example, sensor data may be required to satisfy both temporal and location conditions. For example, only the sensor data from the same location within the last 10 minutes with the highest score is retained as potential data. In various embodiments, the condition may include a trigger attribute that acts as a filter that either transmits or does not transmit sensor data. In some embodiments, the condition at 413 is optional and only the classifier score is compared to a threshold.
In some embodiments, there are separate thresholds for both positive and negative examples. For example, thresholds of +0.5 and-0.5 may be utilized to identify positive sensor data and negative sensor data as potential training data. A classifier score between +0.5 and 1.0 is utilized to identify positive samples and a classifier score between-1.0 and-0.5 is utilized to identify negative samples. In some embodiments, only the positive example is retained for transmission.
At 415, the identified sensor data is transmitted. For example, the identified sensor data is transmitted to a computer server (e.g., training data generation system 120), where it may be used to create training data. In various embodiments, the training data includes a training data set and a validation data set. In some embodiments, the transmitted sensor data includes metadata. Examples of metadata may include a time of the data, a timestamp, road conditions, weather conditions, a location, a vehicle type, whether the vehicle is a left-or right-drive vehicle, a classifier score, a use case, an identifier of a neural network, an identifier of a trigger classifier, a firmware version associated with an autonomous driving system, or other suitable metadata associated with the sensor data and/or the vehicle. In some embodiments, the time of day may indicate a period of time such as dusk, dawn, night, day, full month, solar meal, and the like. For example, identifiers of the neural network and/or the trigger classifier may be transmitted to identify a particular trained machine learning model for training the trigger classifier and for determining the classifier score. In some embodiments, the sensor data and/or metadata is first compressed before being transmitted. In some embodiments, the sensor data is sent in batches to more efficiently communicate the sensor data. For example, compression of multiple images of sensor data is performed, and a series of sensor data is transmitted together.
FIG. 5 is a flow diagram illustrating an embodiment of a process for creating training data from data corresponding to use cases identified by a trigger classifier. For example, the received sensor data is processed to create training data for training a machine learning model. In some embodiments, the sensor data corresponds to driving data captured via an autonomous driving system utilizing a trigger classifier. In some embodiments, the process of fig. 4 is used to receive sensor data by a trigger classifier trained using the process of fig. 3. In some embodiments, the sensor data corresponds to sensor data captured based on a particular use case, such as identification of a bifurcation, an on-ramp, an off-ramp, a tunnel entrance, and so forth on a road. In some embodiments, the received sensor data corresponds to only a positive example of a use case. In some embodiments, the sensor data includes both positive examples and negative examples. In various embodiments, the sensor data includes metadata such as a classifier score, a location, a time of day, or other suitable metadata.
At 501, sensor data satisfying a trigger condition is received. For example, sensor data corresponding to a particular target use case is received and may be used as potential training data. In various embodiments, the sensor data is in a format that uses a machine learning model as input. For example, the sensor data may be raw or processed image data. In some embodiments, the data is data captured from an ultrasound sensor, radar, LiDAR sensor, or other suitable technique. In various embodiments, the trigger condition is specified using a trigger classifier and a trigger attribute, as described with respect to fig. 2-4.
At 503, the sensor data is converted to training data. For example, the sensor data received at 501 includes data identified as potentially useful training data. In some embodiments, the received sensor data is compressed to improve the efficiency for transmitting data from the remote vehicle, and is decompressed first. In some embodiments, the data is reviewed to determine whether the sensor data accurately represents the target use case. For example, a target use case for identifying tunnel exit examples is reviewed to determine whether the raw sensor data is indeed the sensor data of the tunnel exit. In some embodiments, a high accuracy machine learning model is used to confirm whether the sensor data represents a target use case. In some embodiments, a human reviews and confirms whether the sensor data represents a target use case. In some embodiments, useful data for training is annotated. For example, data may be marked as positive or negative examples. In some embodiments, the data is annotated and may be tagged for the target object. For example, lane markings, signs, traffic lights, etc. may be annotated depending on the target use case. In various embodiments, the annotations may be used to train and/or verify the trained machine learning model.
At 505, the training data converted at 503 is prepared as a training and validation data set. In various embodiments, the sensor data converted at 503 is prepared as a dataset for training and a validation dataset for validating the machine learning model. In some embodiments, the training data of 503 is incorporated into an existing training data set. For example, existing training data sets that are applicable to most use cases may be merged with the most recently transformed training data to improve coverage for a particular use case. The newly transformed training data is useful to improve the accuracy of the model in identifying particular use cases. In some embodiments, portions of the existing training data are discarded and/or replaced with new training data.
At 507, the machine learning model is trained. For example, the data prepared at 505 is used to train a machine learning model. In some embodiments, the model is a neural network, such as a Convolutional Neural Network (CNN). In various embodiments, the model includes a plurality of intermediate layers. In some embodiments, the neural network may include multiple layers including multiple convolution and pooling layers. In some embodiments, the training model is validated using a validation dataset created from the received sensor data.
At 509, the trained machine learning model is deployed. For example, a trained machine learning model is installed on the vehicle as an update to the autonomous learning system. For example, the new model may be installed using over-the-air updates. In some embodiments, the update is a firmware update transmitted using a wireless network, such as WiFi or a cellular network. In some embodiments, the new model is used to train a new trigger classifier. In various embodiments, existing trigger classifiers based on an old model expire, and new trigger classifiers are deployed based on the newly trained model. In some embodiments, a new machine learning model is installed when the vehicle is serviced.
FIG. 6 is a flow chart illustrating an embodiment of a process for causing selection of a classifier on a vehicle. The process may optionally be implemented by a vehicle, such as one or more processors. For example, the vehicle may have stored numerous classifiers. In this example, the vehicle may execute a subset of the classifiers to conserve processing resources. For example, the vehicle may determine classifier scores for only a subset. As depicted in fig. 1B, the vehicle may periodically update the subset (e.g., select a new classifier after a threshold amount of time). In some embodiments, the vehicle may receive information from an external system (e.g., system 120) identifying that the vehicle is to execute one or more particular classifiers.
At block 601, the vehicle executes a classifier. As described above, the vehicle may acquire sensor data and determine a classifier score based on the sensor data.
At block 603, the vehicle receives a trigger to select a new classifier. The vehicle may monitor its position at least via a Global Navigation Satellite System (GNSS) receiver. In some embodiments, the vehicle may access map information. The map information may identify certain features or use cases for which it may be advantageous to obtain training data. As an example, the map information may identify a tunnel exit. As another example, the map information may identify a side road that is partially occluded or hidden. As another example, the map information may identify the location of a particular style or form of cycle lane (e.g., elevated or offset cycle lane). The vehicle may determine when it is near a particular feature or use case (e.g., has a threshold distance). The vehicle may then obtain information identifying new classifiers associated with particular features or use cases. The new classifier may then be executed by the vehicle to determine a classifier score for the received sensor data.
Additionally, the vehicle may transmit the location information to an external system. The external system may then communicate information to the vehicle about the new classifier or classifiers that the vehicle is to perform. For example, an external system may transmit a unique identifier associated with each classifier. As depicted in fig. 1B, the external system may have received information from the same classifier executed on at least a certain number of vehicles (e.g., 1, 3, 10, 20). The vehicles may already be within a threshold distance (e.g., radius) of each other such that the external system determines the presence of features or use cases near the locations of the vehicles. Thus, if the vehicle is within a threshold distance of the location, the external system may direct the vehicle to perform the same classifier. In this manner, the external system may acquire sensor data associated with the classifier.
At block 605, the vehicle executes a new classifier. As described herein, the new classifier can obtain information from an intermediate layer of a machine learning model (e.g., a convolutional neural network). The vehicle then determines a classifier score at block 607. The vehicle then transmits sensor data (e.g., an image) based on the classifier score exceeding a threshold at block 609. As described above, sensor data may be transmitted along with metadata.
FIG. 7 is a block diagram illustrating an embodiment of a deep learning system for identifying potential training data. For example, the block diagram includes different components of a deep learning system connected to a trigger classifier for autonomous driving, where a subset of sensor data captured for autonomous driving is identified as potential training data. In some embodiments, the deep learning system may passively analyze the sensor data, and intermediate outputs of layers of the deep learning system are used as inputs to trigger the classifier. In some embodiments, the deep learning system actively analyzes and controls the operation of the vehicle while also identifying and retaining potentially useful sensor data for creating additional training data. In some embodiments, autonomous driving systems are used for self-driving or driver-assisted operation of a vehicle. In various embodiments, the processes of fig. 2-6 utilize components of a deep learning system and/or system (such as the system described in fig. 7).
In the example shown, the deep learning system 700 is a deep learning network that includes a sensor 701, an image preprocessor 703, a deep learning network 705, an Artificial Intelligence (AI) processor 707, a vehicle control module 709, a network interface 711, and a trigger classifier module 713. In various embodiments, different components are communicatively connected. For example, sensor data from the sensor 701 is fed to an image pre-processor 703. The processed sensor data of the image preprocessor 703 is fed to a deep learning network 705 running on an AI processor 707. The output of the deep learning network 705 running on the AI processor 707 is fed to the vehicle control module 709. Intermediate results of the deep learning network 705 running on the AI processor 707 are fed to a trigger classifier module 713. Sensor data is sent via the network interface 711, which triggers retention for communication by the trigger classifier module 713. In some embodiments, the trigger classifier module 713 runs on the AI processor 707. In various embodiments, based on the autonomous operation of the vehicle and/or the results of the trigger classifier module 713, the network interface 711 is used to communicate with a remote server, place a phone call, send and/or receive a text message, transmit sensor data identified by the trigger classifier module 713, and so on. In some embodiments, the deep learning system 700 may include additional or fewer components, as appropriate. For example, in some embodiments, the image pre-processor 703 is an optional component. As another example, in some embodiments, post-processing is performed on the output of the deep learning network 705 using a post-processing component (not shown) before the output is provided to the vehicle control module 709.
In some embodiments, the sensor 701 includes one or more sensors. In various embodiments, the sensor 701 may be secured to the vehicle at different locations of the vehicle, and/or oriented in one or more different directions. For example, the sensor 701 may be secured to the front, sides, rear, and/or roof, etc. of the vehicle in a forward, rearward, sideways, etc. direction. In some embodiments, the sensor 701 may be an image sensor, such as a high dynamic range camera. In some embodiments, the sensor 701 comprises a non-visual sensor. In some embodiments, sensors 701 include radar, LiDAR, and/or ultrasonic sensors, among others. In some embodiments, the sensor 701 is not mounted to the vehicle with the vehicle control module 709. For example, the sensors 701 may be mounted on adjacent vehicles and/or fixed to a road or environment and included as part of a deep learning system for capturing sensor data.
In some embodiments, an image preprocessor 703 is used to preprocess the sensor data of the sensor 701. For example, the image pre-processor 703 may be used to pre-process the sensor data, split the sensor data into one or more components, and/or post-process one or more components. In some embodiments, the image preprocessor 703 is a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), an image signal processor, or a dedicated image processor. In various embodiments, the image preprocessor 703 is a tone mapper processor for processing high dynamic range data. In some embodiments, the image preprocessor 703 is implemented as part of an Artificial Intelligence (AI) processor 707. For example, the image preprocessor 703 may be a component of the AI processor 707.
In some embodiments, deep learning network 705 is a deep learning network for implementing autonomous vehicle control. For example, the deep learning network 705 may be an artificial neural network, such as a Convolutional Neural Network (CNN), that is trained using sensor data, and the output of which is provided to the vehicle control module 709. In some embodiments, a copy of the neural network of the deep learning network 705 is used to create the trigger classifier of the trigger classifier module 713.
In some embodiments, Artificial Intelligence (AI) processor 707 is a hardware processor for running deep learning network 705 and/or trigger classifier module 713. In some embodiments, the AI processor 707 is a dedicated AI processor for performing inference on the sensor data using a Convolutional Neural Network (CNN). In some embodiments, the AI processor 707 is optimized for the bit depth of the sensor data. In some embodiments, the AI processor 707 is optimized for deep learning operations (such as neural network operations including convolution, dot product, vector, and/or matrix operations, among others). In some embodiments, the AI processor 707 is implemented using a Graphics Processing Unit (GPU). In various embodiments, the AI processor 707 is coupled to a memory configured to provide the AI processor with instructions that, when executed, cause the AI processor to perform deep learning analysis on the received input sensor data and determine machine learning results used to at least partially autonomously operate the vehicle. In some embodiments, the AI processor 707 is configured to output intermediate results of one or more layers of the deep learning network 705 to the trigger classifier module 713 for use in determining a classifier score.
In some embodiments, the vehicle control module 709 is utilized to process the output of the Artificial Intelligence (AI) processor 707 and convert the output to vehicle control operations. In some embodiments, the vehicle control module 709 is utilized to control a vehicle for autonomous driving. In some embodiments, the vehicle control module 709 may adjust the speed and/or steering of the vehicle. For example, the vehicle control module 709 may be used to control a vehicle by braking, steering, changing lanes, accelerating, merging into another lane, and the like. In some embodiments, the vehicle control module 709 is used to control vehicle lighting, such as brake lights, turn signal lights, headlights, and the like. In some embodiments, the vehicle control module 709 is used to control vehicle audio conditions, such as the vehicle's sound system, play an audio alert, enable a microphone, enable a speaker, and the like. In some embodiments, the vehicle control module 709 is used to control a notification system, including a warning system, to notify the driver and/or passengers of a driving event, such as a potential collision or approaching a predetermined destination. In some embodiments, the vehicle control module 709 is used to adjust sensors, such as the vehicle's sensors 701. For example, the vehicle control module 709 may be used to change parameters of one or more sensors, such as modify orientation, change output resolution and/or format type, increase or decrease capture rate, adjust captured dynamic range, adjust focus of a camera, enable and/or disable sensors, and so forth. In some embodiments, the vehicle control module 709 may be used to change parameters of the image pre-processor 703, such as modifying the frequency range of the filter, adjusting feature and/or edge detection parameters, adjusting channel and bit depth, and so forth. In various embodiments, the vehicle control module 709 is used to implement autonomous driving and/or driver-assist control of the vehicle.
In some embodiments, network interface 711 is a communication interface for transmitting and/or receiving data, including voice data. In various embodiments, network interface 711 includes a cellular or wireless interface for interfacing with a remote server to: making connection and making voice calls, transmitting and/or receiving text messages, transmitting sensor data, receiving updates to the autonomous driving system (including triggering classifiers and attributes), and the like. For example, the network interface 711 may be used to receive updates for: instructions and/or operating parameters for the sensor 701, the image preprocessor 703, the deep learning network 705, the AI processor 707, the vehicle control module 709, and/or the trigger classifier module 713. For example, the network interface 711 can be used to update the machine learning model of the deep learning network 705. As another example, the network interface 711 may be used to update firmware of the sensor 701 and/or operating parameters of the image pre-processor 703, such as image processing parameters.
In some embodiments, the network interface 711 is used to transmit sensor data identified by the trigger classifier module 713. For example, sensor data corresponding to a particular use case identified by the trigger classifier and satisfying the conditions of the associated trigger attribute is transmitted to a computer server, such as a remote computer server, via the network interface 711. In some embodiments, the trigger classifier and trigger attributes are updated via the network interface 711. The updated trigger classifier and trigger attributes are installed to trigger classifier module 713 and are used to identify and retain sensor data corresponding to a particular use case.
In some embodiments, the network interface 711 is used to make emergency contact with emergency services in the event of an accident or a proximity accident. For example, in the event of a collision, the network interface 711 may be used to contact emergency services for assistance, and may inform the emergency services of the location of the vehicle and the details of the collision. In various embodiments, the network interface 711 is used to implement autonomous driving features, such as accessing calendar information to retrieve and/or update a destination location and/or an expected arrival time.
In some embodiments, the trigger classifier module 713 is utilized to identify and retain sensor data corresponding to a particular use case. For example, trigger classifier module 713 determines a classifier score for data captured by one or more of sensors 701. The classifier score is compared to a threshold and may be retained and communicated to a remote computer server via network interface 711. In some embodiments, the trigger classifier module 713 utilizes the trigger attributes to determine whether appropriate conditions are met to determine a classifier score and/or retain sensor data that meets a classifier score threshold. In some embodiments, the trigger classifier module is a support vector machine and receives the intermediate output of the deep learning network 705 as an input representing the sensor data of the sensor 701. In some embodiments, the trigger classifier module 713 is configured to deep learn intermediate results of one or more layers of the network 705. The output of a particular layer may depend on the trigger classifier and/or the trigger attributes. For example, some use cases may use earlier intermediate results, and other use cases may use later intermediate results. In some embodiments, the AI processor 707 can be utilized to perform the processing of the trigger classifier module 713. In various embodiments, the sensor data identified by the trigger classifier module 713 is used to create a new training data set to identify a particular use case.
The various aspects, embodiments, implementations, or features of the described embodiments may be used separately or in any combination. The various aspects of the described embodiments may be implemented in software, hardware, or a combination of hardware and software. The described embodiments may also be embodied as computer readable code on a computer readable medium for controlling manufacturing operations or may be embodied as computer readable code on a computer readable medium for controlling a manufacturing line. The computer readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer readable medium include read-only memory, random-access memory, CD-ROMs, HDDs, DVDs, magnetic tape, and optical data storage devices. The computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the described embodiments. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the described embodiments. Thus, the foregoing descriptions of specific embodiments have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the described embodiments to the precise forms disclosed. It will be apparent to those skilled in the art that many modifications and variations are possible in light of the above teaching.
It should be understood that each of the processes, methods, and algorithms described herein and/or depicted in the figures can be embodied in, or fully or partially automated by, code modules executed by one or more physical computing systems, hardware computer processors, special purpose circuitry, and/or electronic hardware configured to execute specific and particular computer instructions. For example, the computing system may comprise a general purpose computer (e.g., a server) or a special purpose computer, special purpose circuitry, etc., which is programmed with the specific computer instructions. A code module may be compiled and linked into an executable program, installed in a dynamically linked library, or written in an interpreted programming language. In some embodiments, certain operations and methods may be performed by circuitry that is specific to a given function.
Furthermore, certain embodiments of the disclosed functionality are complex enough mathematically, computationally, or technically that dedicated hardware or one or more physical computing devices (with appropriate dedicated executable instructions) may be required to perform the functionality, e.g., due to the capacity or complexity of the computations involved, or to provide the results in substantially real time. For example, video may include many frames, each having millions of pixels, and require specially programmed computer hardware to process the video data to provide the desired image processing task or application in a commercially reasonable amount of time.
The code modules or any type of data may be stored on any type of non-transitory computer readable medium, such as physical computer storage, including hard drives, solid state memory, Random Access Memory (RAM), Read Only Memory (ROM), optical disks, volatile or non-volatile storage, combinations thereof, and so forth. In some embodiments, the non-transitory computer readable medium may be part of one or more of a local processing and data module, a remote processing module, and a remote data store. The methods and modules (or data) may also be transmitted as a generated data signal (e.g., as part of a carrier wave or other analog or digital propagated signal) on a variety of computer-readable transmission media, including wireless-based and wired/cable-based media, and may take a variety of forms (e.g., as part of a single or multiplexed analog signal, or as multiple discrete digital packets or frames). The results of the disclosed processes or process steps may be stored permanently or otherwise in any type of non-transitory tangible computer storage, or may be communicated via a computer readable transmission medium.
Any process, block, state, step, or function in the flowcharts described herein and/or depicted in the figures should be understood as potentially representing a module, segment, or portion of code which includes one or more executable instructions for implementing specific functions (e.g., logical or arithmetic) or steps in the process. Various processes, blocks, states, steps or functions may be combined, rearranged, added to, deleted from, modified or otherwise changed in connection with the illustrative examples provided herein. In some embodiments, additional or different computing systems or code modules may perform some or all of the functions described herein. The methods and processes described herein are also not limited to any particular order, and the blocks, steps, or states associated therewith may be performed in any other appropriate order, such as serially, in parallel, or in some other manner. Tasks or events may be added to or deleted from the disclosed example embodiments. Moreover, the separation of various system components in the embodiments described herein is for illustrative purposes, and should not be understood as requiring such separation in all embodiments. It should be understood that the described program components, methods, and systems can generally be integrated together in a single computer product or packaged into multiple computer products.
In the foregoing specification, one or more innovations have been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the innovation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Indeed, it should be understood that the systems and methods of the present disclosure each have several innovative aspects, no single one of which is fully responsible for or requires the desirable properties disclosed herein. The various features and processes described above may be used independently of one another or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure.
Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination. No single feature or group of features is required or essential to each embodiment.
It will be understood that, unless specifically stated otherwise, or otherwise understood in the context of usage, conditional language such as "can," "might," "may," "for example," and the like, as used herein, are generally intended to convey that certain embodiments include certain features, elements and/or steps, while other embodiments do not. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without author input or prompting, whether such features, elements and/or steps are included or are to be performed in any particular embodiment. The terms "comprising," "including," "having," and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and the like. Also, the term "or" is used in its inclusive sense (and not in its exclusive sense), so that when used to connect a list of elements, for example, the term "or" means one, some, or all of the elements in the list. In addition, the articles "a", "an", and "the" as used in this application and the appended claims should be construed to mean "one or more" or "at least one" unless specified otherwise. Similarly, while operations may be depicted in the drawings in a particular order, it will be appreciated that such operations need not be performed in the particular order shown or in sequential order, or that all illustrated operations need not be performed, to achieve desirable results. Further, the figures may schematically depict one or more example processes in the form of a flow diagram. However, other operations not depicted may be incorporated in the example methods and processes schematically illustrated. For example, one or more additional operations may be performed before, after, concurrently with, or between any of the illustrated operations. Additionally, in other embodiments, the operations may be rearranged or reordered. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products. Additionally, other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results.
Thus, the claims are not intended to be limited to the embodiments shown herein but are to be accorded the widest scope consistent with the present disclosure, principles and novel features disclosed herein.
Claims (20)
1. A method, comprising:
receiving sensor data;
applying a neural network to the sensor data;
applying a trigger classifier to intermediate results of the neural network to determine a classifier score for the sensor data; and
determining whether to transmit at least a portion of the sensor data via a computer network based at least in part on the classifier score.
2. The method of claim 1, wherein the intermediate result is an output of an intermediate layer of the neural network.
3. The method of claim 2, wherein the intermediate result is an output of a penultimate layer of the neural network.
4. The method of claim 1, wherein the neural network is a convolutional neural network.
5. The method of claim 1, wherein the trigger classifier is trained using a training data set that is analyzed at least in part by a second neural network using a machine learning model that is based on the neural network used to determine the classifier score.
6. The method of claim 5, wherein the trigger classifier is trained using an input vector, wherein the input vector is an output of a layer of the second neural network.
7. The method of claim 6, wherein the layer of the second neural network is dynamically selected.
8. The method of claim 6, wherein the trigger classifier is wirelessly communicated to a vehicle applying the neural network.
9. The method of claim 1, wherein the trigger classifier has been generated based on identified need for improvement for the neural network.
10. The method of claim 1, wherein the trigger classifier is used to identify one or more of: tunnel entrance, tunnel exit, bifurcation in a road, obstacle in a road, road lane line, or drivable space.
11. The method of claim 1, wherein determining whether to transmit the at least a portion of the sensor data via the computer network based at least in part on the classifier score comprises: comparing the classifier score to a threshold.
12. The method of claim 1, further comprising: determining whether to apply the trigger classifier based on one or more required conditions.
13. The method of claim 12, wherein the one or more desired conditions are based on one or more of: a length of driving time, a shortest time since a last retained sensor data of the trigger classifier, a disengagement event associated with an autonomous driving feature, a vehicle type, a steering angle threshold, or a road type requirement.
14. The method of claim 1, wherein the trigger classifier specifies a particular layer of the neural network from which to receive the intermediate results.
15. The method of claim 1, further comprising: transmitting the at least a portion of the sensor data and metadata identifying one or more of: a classifier score, a location, a timestamp, a road type, a length of time since a previously transmitted sensor data, or a vehicle type.
16. The method of claim 1, further comprising: transmitting the at least a portion of the sensor data and an operating condition of a vehicle, the operating condition identifying one or more of: vehicle speed, vehicle acceleration, vehicle braking, or vehicle steering angle.
17. The method of claim 1, further comprising: receiving, via the computer network, the trigger classifier represented by a weight vector.
18. The method of claim 17, wherein the trigger classifier is represented by the weight vector and a bias.
19. A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
receiving sensor data;
applying a neural network to the sensor data;
applying a trigger classifier to intermediate results of the neural network to determine a classifier score for the sensor data; and
determining whether to transmit at least a portion of the sensor data via a computer network based at least in part on the classifier score.
20. A system, comprising:
sensors on the vehicle;
an artificial intelligence processor;
a vehicle control module;
an image signal processor configured to:
receiving an image captured using the sensor;
processing the captured image; and
providing the processed image to a neural network;
a memory coupled with the artificial intelligence processor, wherein the memory is configured to provide instructions to the artificial intelligence processor that, when executed, cause the artificial intelligence processor to:
receiving the processed image;
performing inference on the processed image using the neural network;
providing intermediate results of the neural network to a trigger classifier, wherein the trigger classifier is used to determine a classifier score corresponding to the captured image; and
providing the interference results of the neural network to the vehicle control module to operate the vehicle at least partially autonomously; and
a network interface configured to:
communicating at least a portion of the captured image based at least in part on the classifier score.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US62/731,651 | 2018-09-14 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK40044289A true HK40044289A (en) | 2021-09-30 |
| HK40044289B HK40044289B (en) | 2025-01-10 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7512452B2 (en) | SYSTEM AND METHOD FOR ACQUIRING TRAINING DATA - Patent application | |
| JP7766661B2 (en) | 3D feature prediction for autonomous driving | |
| US12223428B2 (en) | Generating ground truth for machine learning from time series elements | |
| HK40044289A (en) | System and method for obtaining training data | |
| HK40044289B (en) | System and method for obtaining training data | |
| HK40062368A (en) | Generating ground truth for machine learning from time series elements | |
| HK40062371A (en) | Predicting three-dimensional features for autonomous driving |