GB2574224A

GB2574224A - Traffic management system

Info

Publication number: GB2574224A
Application number: GB1808854.2A
Authority: GB
Inventors: Howell Shaun
Original assignee: Vivacity Labs Ltd
Current assignee: Vivacity Labs Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2019-12-04
Anticipated expiration: 2038-05-31
Also published as: GB2574224B; WO2019228848A1; GB201808854D0

Abstract

An active traffic management system includes road network sensors and output means (e.g. overhead signs, traffic signals or movable barriers). The output means control traffic based on estimates of future states of the road network which are repeatedly predicted by a machine learning model (e.g. recurrent neural-network including long short-term memory (LSTM) cells) using data from the sensors. Meanwhile a stochastic model generation module uses the sensor data to generate, train and test models to replace the previous model. This may include generating a population of models, each defined by a set of hyper-parameters; allocating random values to the hyper-parameters; training and testing each model to assign it a fitness score; select best models; applying a mutation operation to some of the best models to change a hyper-parameter of each mutated model; and applying a crossover operation to pairs of the best models to create new child models based on combination of the hyper-parameters in respective pairs; until an end condition is met. The prediction model is thus repeatedly updated to improve accuracy and tolerance to gaps in sensor data streams.

Description

TRAFFIC MANAGEMENT SYSTEM

The present invention relates to predicting future states, for example congestion, in a traffic system.

BACKGROUND TO THE INVENTION

It is now well known to equip road systems with sensors to monitor various aspects of road use. For example, sensors can be provided which detect when there is queuing in a particular part of a road system, and more generally provide details of traffic speed and density. Sensors can also show which parts of road systems are used most by cars, larger vehicles, bicycles, motorcycles, pedestrians and so on. They can be used to detect behaviours such as lane changing taking place at particular locations. Incidents such as collisions or breakdowns can also be detected in near real time.

The Applicant’s co-pending application W02018051200 discloses a method of using an image capture unit to identify different road users and produce data relating to those road users.

A control centre in a large city or for a network of major trunk roads will typically use data from such sensors, together with collateral information from police, the public, and online services such as Google (RTM) Maps, to understand in near real time what is going on in the road system and to make appropriate decisions. For example, as a result of larger-than-usual traffic density a lower speed limit may be temporarily imposed to reduce the risk of collisions and try to avoid a stop-start state developing. As a result of a collision or breakdown lanes may be closed and/or diversions put into place.

Current systems are almost entirely reactive. In other words, a temporary speed limit, suggesting alternative routes, changing traffic light phasing, deployment of police to redirect traffic, or another mitigating measure would only be put in place after congestion has been detected. The earlier that such measures can be put in place the more effective they are, as they prevent more cars entering the delayed and congested region. The mitigations deployed today are therefore less effective than they could have been if it had been put in place earlier.

There is therefore a need for a system which can predict congestion and other traffic problems before they occur.

Collisions and breakdowns are in their nature unpredictable events. However, the knock-on effects of such an incident can potentially be predicted. Putting mitigating measures in place as early as possible is critical to reducing the negative impacts of such unpredictable events. In particular, it has been found that a 10 minute increase in the duration of primary accidents leads to a 15% increase in the frequency of secondary accidents.

There is therefore also a need for a system which can predict the effects of traffic incidents, so that appropriate mitigating measures can be put in place.

There are some known approaches to predicting traffic flows including statistical modelling approaches, simulations, and machine-learning based prediction. Of these, the state-of-the-art machine-learning based approaches are the most accurate. Various types of neural networks have been used in this field (e.g. recurrent neural networks, particularly including long short-term memory (LSTM) cells, feed forward deep neural networks and other types of recurrent neural network), as well as knearest-neighbours models, pure convolutional models, support vector regression models and Bayesian inference.

However, the state-of-the-art machine-learning traffic prediction products generally create a single model, which is updated infrequently with a historical view of the data - if it is updated at all. These products tend to lose prediction accuracy over time, since changes to road systems and road use lead to the model becoming invalid. Prediction accuracy in response to unusual events can be poor. Also, in a network of sensors around a road system it is common for sensors to come on- and off-line unpredictably due to faults in the sensors, obstructions, power supply issues, and so on. The input data dimensionality is not fixed and some data streams may be incomplete. These issues lead to a lack of prediction accuracy in state-of-the-art machine-learning traffic prediction products. With these known products, incorporating a new sensor involves a manual process of rebuilding the model. The statistical models and deterministic models similarly require regular retuning to maintain an acceptable level of accuracy.

It is an object of the invention to provide a traffic prediction product with improved accuracy and improved tolerance to gaps in data streams from sensors.

SUMMARY OF THE INVENTION

According to the present invention, there is provided a traffic management system for a road network comprising:

a plurality of sensors for monitoring vehicles and/or other users of the road network;

a traffic prediction system; and active traffic management means, the active traffic management means including outputs for controlling the vehicles and/or other users of the road network, the sensors providing inputs to the traffic prediction system, the traffic prediction system providing estimates of a future state of the road network, and the active traffic management means using outputs to control the vehicles and/or other users of the road network in response to the estimates of future state, characterised in that the traffic prediction system includes a prediction module adapted to apply a machine learning model to the input data from the sensors to produce predictions, and a stochastic model generation module adapted to generate, train and test machine learning models using the input data from the sensors, the model generation module sending an updated model to the prediction module whenever an updated and improved model is available, and the prediction module replacing any previous model with the updated model when it is received from the model generation module.

The stochastic model generation module may be an evolutionary model generation module.

The sensors may be similar to those described in WO2018051200. They may be camera-based sensors including entity detection, for example capable of identifying cars, bicycles, pedestrians, vans, buses, lorries, etc in the field of view of the camera. The sensors are preferably capable of detecting the type, speed and direction at least of each entity in the field of view of the sensor.

Sensors may be placed, for example at intersections in a “grid” system of roads. Placement of sensors will depend on the characteristics of the particular road network.

In some embodiments, instead of or as well as “infrastructure mounted” sensors such as those in WO2018051200, the sensors may include vehicle sensors such as smart phones or connected vehicles or telematics boxes.

The outputs from of the active traffic management means may include at a basic level, overhead signs at locations in the road network for providing information to road users, closing lanes, setting temporary speed limits, and similar. Other outputs may include for example speed enforcement cameras, traffic signals, and movable barriers for physically opening or closing links on the road network to different types of traffic.

It is envisaged that in some systems the active traffic management means may include transmitters for sending data directly to vehicles. At a basic level the data sent to the vehicle may just be displayed on an in-vehicle display, but it is also possible for some vehicles, for example autonomous, semi-autonomous or driverless vehicles, to be controlled directly by the active traffic-management means, for example to cause the vehicle to take a certain route to its destination, or to limit its speed.

The models produced by the evolutionary model generation module and executed by the prediction module may be a type of neural network, for example a recurrent neural network including long short-term memory (LSTM) cells.

The prediction model may run repeatedly on new data. As soon as the model has been executed and predictions generated, the model will be run again on the most upto-date data. It is found that in practical embodiments, all steps of the prediction process may take around 60 seconds. By the time a prediction has been generated therefore, there will be another 60 seconds or so worth of more up-to-date data which can be used as the basis for the next prediction. Preferably, the sensors continually feed data into a database and the prediction model has access to that database to retrieve data each time the model is executed.

The prediction module may carry out a step of cleaning input data before the model is executed. The particular characteristics of the model in use (which may change frequently as the evolutionary model generation module supplies new models) will dictate which data streams are required. Not all models will require all data streams available from the input sensors. However, the required data streams need to be in the correct order and with no missing values before a model can be executed. In a large road network spanning for example a city centre or an even larger area, it is likely that at any particular time at least one sensor may be unavailable for one reason or another, and so there may be gaps in the input data. Any missing values may be filled with appropriate data to facilitate running of the model. This may be achieved, for example, using a historical median approach in which the target values are filled by reference to typical values of historical data at that time of the day, on that day of the week, for example. If insufficient data is available, the time window examined in historical data may be iteratively broadened. An alternative way of filling missing data, which is found to be appropriate for relatively short missing periods, is to fill the missing data by interpolating between the time period immediately before the gap and immediately afterwards.

The data cleaning step can be carried out incrementally where the same prediction model is being run on updated data. The data cleaning step may also include a sorting step, since due to transmission irregularities from the sensors it may be that the “new data” which has been received in the last 60 seconds or so in some data streams relates to events which took place at an earlier time. In this case only the last 60 seconds or so of data will need to be cleaned. However, when a new model becomes available the relevant database tables may be dropped and re-built, the entire cleaning process being carried out afresh. This is because the new model may include a different subset of data streams, and/or require the data to be ordered differently.

In some embodiments, historical data may be periodically checked, and cleaned if required. This periodic cleaning may occur even on data which was already cleaned, since the filling in of missing data can be improved once there is more data available from which estimates based on averages can be derived. For that reason, data which is “filled in” is preferably marked as an estimate, to allow that estimate to be revised or improved later, when more data is available.

The prediction module may carry out a further data preparation step of downsampling the data. The size of the timestep required may be a hyperparameter which is provided whenever a new model is provided to the prediction module. The size of the timestep may therefore be larger than the maximum resolution of the available data, in which case a downsampling step needs to take place.

The prediction module may carry out a further data preparation step on the cleaned data before the model is executed. The cleaned data is in the form of a 2D tensor with each column representing an individual data stream and each row representing an individual timestep. The further data preparation step may include ‘lagging’ the data backwards through time by a number of lag steps. The number of lag steps is a hyperparameter which is provided whenever a new model is provided to the prediction module. The result of the ‘lagging’ is a new 3D tensor in which each row contains the value for each data stream for a particular timestamp and also (along the third axis) for one or more previous timestamps. Each input node to the neural network therefore accepts (at each time step) a tuple of values for a particular data stream at times (t, t1, t-2), for example.

As a result of the lagging process the earliest timesteps will now have missing data (if the number of lag steps is two then the earliest and second earliest timesteps will lack the data for their two previous timesteps). These earliest and incomplete rows are removed from the tensor.

Another data preparation step may include augmenting the sensor data with collateral data. Examples of collateral data include weather-related data such as rainfall, temperature, visibility etc. The collateral data may be obtained from internet services but it will be appreciated that in some cases individual sensors may include data streams for things like temperature. Apart from weather-related data, other collateral sources may include for example social media or news feeds. Fairly simple filters could be applied to these feeds to pick up words like “accident” or “congestion” as well as particular placenames. More complex classifiers could be applied to attempt to better filter this type of data. Alternatively, a filtering /reporting process could take place manually. Sports fixtures and details of similar events are another type of collateral data which might be usefully added to the data collected from sensors.

Once the data has been cleaned and prepared to create an input tensor, the prediction model feeds the input tensor into the neural-network prediction model. This results in a multi-timestep, multi-variate output which represents the model’s predictions about the future state of the network. This prediction can in turn be fed to the active traffic management means so that appropriate action can be taken - for example, if the model predicts a very large volume of traffic on a particular road in ten minutes’ time, then it may be useful to reduce the speed limit on that stretch and/or to open hard shoulder lanes, for example.

The purpose of the evolutionary model generation module is to produce, train and test candidate models to search for the best possible model based on current circumstances. When a ‘best’ model is identified, it is sent to the prediction module to replace the current model in place.

The model generation module makes use of the same input data from the sensors as the prediction module, i.e. the same input data from the sensors - not necessarily the same cleaned I filtered I prepared data. It uses the same process of replacing any missing or incorrect values with typical values derived from historical data or interpolated. The data may be truncated to, for example, the last year.

The model generation module may carry out a step of filtering available data streams. In other words, a subset of the available data streams may be selected for possible inclusion in the models. Various criteria may be used for the filtering process, for example:

(i) “Usefulness” scores may be available for different data streams. A particular data stream may be useful if predicting future values for that data stream has a particular operational purpose. These scores may be to some extent manually applied, and may put a constraint on the extent to which some of the other factors may cause particular streams to be disregarded.

(ii) The amount of historical data available. It has been found that in some practical implementations, a minimum of two weeks of data must be available for a stream to be included in a model. A new sensor that comes online or an upgraded sensor that produces a new stream of data will therefore need to be deployed for at least this period of time before the new data has any impact on predictions.

(iii) Within the data available, the data stream is of sufficiently high quality - i.e. that the amount of missing or erroneous data does not exceed some threshold.

(iv) Within a more recent timeframe, another, possibly different, threshold for data quality might be set.

The criteria for selecting - and hence deselecting - certain data streams may be tailored in particular to ensure tractability with available hardware.

The evolutionary model generation module may carry out a step of creating a random population of candidate models. Each model may be, for example, an LSTM-based recurrent neural network model. The model is defined by a set of model hyperparameters. When the population of random models is generated, the hyperparameters of each model in the population are given random values. As an alternative, the initial values might be based on a heuristic, or be generated otherwise than completely randomly. The initial values in some embodiments may include both non-random and random elements.

The size of the population may be configurable. In some embodiments, the population size may be about 200.

Examples of hyperparameters are given below. In any particular embodiment, the combination of any preset configuration and the complete list of randomized hyperparameters will completely define a model, together with the trained parameters.

Hyperparameters which are initially randomized may include:

• Timestep - the frequency to which the data is downsampled;

• Loss - the loss function to use in training a candidate model;

• Widths - the list of layer widths; i.e. the number of nodes in each layer of the model;

• Lag - the number of timesteps to lag the data through along the third axis to create a 3D tensor for input to the model;

• Epochs - the number of epochs for which the model is trained;

• Batch size - the batch size to use when training the model;

• Optimiser - the name of the optimisation algorithm to use in training. This may be selected from a set of known optimisation algorithms which are available to the model generation module;

• LSTM dropout - the fraction of the inputs to each layer to drop;

• Recurrent dropout - the fraction of the recurrent state connections to drop;

• L1 and L2 reg: the amount of L1 and L2 regularisation to apply to each of the LSTM parts:

o Kernel - the weights matrices used within the LSTM cell, i.e. at each of its gates;

o Activation - the output of an LTSM layer;

o Bias - the bias vectors used within the LSTM cell, i.e. at each of its gates; o Recurrent - the weights matrices used at the recurrent connection of each gate • Activation - the type of activation function used in each non-recurrent function;

• Recurrent activation - the type of activation function used in each recurrent function.

For each model in the population, a fitness value is calculated. The fitness value is a measure of the model’s accuracy of prediction. To calculate the fitness value, each model is trained and tested according to methods known in the art. Briefly, the input data is separated into three subsets, say subsets A, B and C. The quantity of data in each subset may be for example split in the ratio 50:20:30. For training the models and calculating fitness values of each model, set A is training data and set B is validation data, the training data being used to adjust the parameters in the neural network and the validation data being used to detect and prevent overfitting. Set C is testing data, the testing data being then used to measure the predictive accuracy of the trained neural network and assign a fitness score based on a hyper-loss function.

The models in the population may then be sorted based on fitness. The population is filtered I selected by choosing a predetermined percentage of the best models. In some embodiments, the selection includes a random aspect, for example where a predetermined percentage of the best models (by fitness score) are selected along with a random selection of models with lower fitness scores.

The model generation function may carry out a mutation phase. For each model left in the population after filtering, a mutation operation may be performed or not with a predetermined probability. For example, in some embodiments there may be a 5% probability in each case that a model will be subject to a mutation. A simple example of one way in which a mutation is implemented is that one of the hyperparameters (chosen at random) is reassigned a random value.

The model generation function may carry out a crossover phase. After mutation, the number of models in the population is increased to the predetermined population size (for example 200) by crossover operations. One way a crossover can be implemented is for each ‘child’ model, two ‘parent’ models are chosen at random, and to set each of the hyperparameters in the child model either the value of that hyperparameter in one parent or the value of that hyperparameter in the other parent is chosen. The “crossover” process is repeated until enough ‘child’ models have been created for the population to reach its predetermined size (e.g. about 200 models).

Once the final population is again at its predetermined size, fitness is calculated for each model in the new population. For parent models which were not dropped in the selection phase or mutated in the mutation phase, this fitness score should in fact already be available and should not need to be re-calculated. However, all of the ‘child’ models and any ‘parent’ models which were mutated will have to be trained and tested.

At this point a decision needs to be made as to whether or not to continue evolving the population (i.e. whether to run another “hyper epoch”). Preferably, the evolutionary algorithm may run for a predetermined maximum number of hyper epochs and if that number has been reached then no further evolution will take place and the population becomes “final”.

In some embodiments, the models in the population after crossover may be re-tested using a different dataset from the test data used to evaluate fitness during the selection stage. For example, where set C was used as test data in the selection stage, the models may now be retested against dataset B. The end condition, where a ‘final’ population is settled upon and no further evolution occurs, may be the result of this retest. Typically, if some measure of the overall fitness of the population has deteriorated or not improved as a result of a single pass of the evolutionary algorithm (i.e. the selection, mutation and crossover phases), then no further evolutionary stages will occur and the population becomes “final”. The measure of the overall fitness of the population could be for example some kind of average of the fitness of each model as determined by the testing, or an average of the top 50% (or some other proportion) of the models. The state where the overall population fitness is no longer significantly improving is known as ‘hyperconvergence’.

If the maximum number of hyper-epochs has not been reached and the end condition has not been met, then another phase of evolution will begin starting from the new population, which is selected, mutated, and crossed-over. This will be repeated either until the maximum number of hyper epochs is reached or the end condition is met.

io

When the evolutionary algorithm is terminated the best model in the population can be ‘activated’, i.e. sent to the prediction module for live use. Before this is done, in some embodiments the performance of the new ‘best’ model in the evolved population may be compared against the currently ‘activated’ model, to ensure that models in the prediction module are only replaced with better models.

Other types of evolutionary algorithm, or more generally stochastic optimisation algorithm, may be used in different embodiments. For example, a stochastic optimisation algorithm based on a gradient descent method may be suitable.

According to a second aspect of the invention, there is provided a method of actively managing a road network as claimed in claim 13.

Preferably I optional features of the second aspect of the invention are set out in claims 14 to 24.

According to a third aspect of the invention, there is provided a computer program product according to claim 25.

DESCRIPTION OF THE DRAWINGS

For a better understanding of the invention and to show more clearly how it may be carried into effect, specific embodiments and implementations will now be described with reference to the accompanying drawings, in which:

Figure 1 shows a visualisation of a predictive model, and changes which may be made to the model as part of an evolutionary process;

Figure 2 shows a visualisation of inputs to a predictive model, in which one of the inputs becomes unavailable and the predictive model is updated as a result;

Figure 3 shows a visualisation of inputs to a predictive model, in which the model is updated to take account of a greater time period in each input stream; and

Figure 4 is a flowchart illustrating the process of creating new models by an evolutionary process.

DESCRIPTION OF SPECIFIC EMBODIMENTS

Figure 1 illustrates a predictive model in the form of a neural network 10. Specifically, the neural network 10 may be a recurrent neural network made from LSTM (Long

Short-Term Memory) cells. Such models are defined in terms of ‘parameters’, which are determined during training of the model using training data, and ‘hyperparameters’ which are fixed properties of the model which are not changed during training. Some of the hyperparameters relate to the ‘structure’ of the model, for example the number of layers and the number of nodes per layer. However there are other hyperparameters which define the model pre-training which do not relate to the ‘visible’ structure as shown in Figure 1. For example, the number of lag steps applied to the data and the number of epochs for which the model is trained.

Hyperparameters also include the data streams to be included as inputs to the model and how they are filtered, the duration of data used in making predictions and the sample rate, and the data streams to be included as outputs (i.e. the data streams the future values of which are being predicted) and the future timescales over which predictions are made. The types of nodes and types of layers are also potential hyperparameters - for example layers can be RNN or CNN layers and nodes can be LSTM or GRU nodes.

In Figure 1, models 10’ and 10” are two different models having different, but similar, hyperparameters. Model 10’ has an extra layer compared with model 10. Model 10” has two extra nodes in its second layer.

In the context of this invention, the inputs to the models are data streams from traffic sensors. The outputs are multi-timestep multi-variate predictions of a future state of the traffic system.

Referring to Figure 2, the input data streams in the traffic management system come from a network of traffic sensors 12. For example, these sensors may be located at every intersection on a road network spanning a city centre or an even wider area. The sensors typically communicate wirelessly with a central hub where the prediction module is located. At any particular time, in a large sensor network there may be problems with individual faulty sensors, power issues, transmission issues etc. It is important that the system is resilient to these problems without an unduly detrimental impact on prediction quality. As a result of one of the sensors 12’ in Figure 2 becoming unavailable or the data quality being poor, model 10 needs to be changed to require fewer inputs. This means providing a new model 10’.

Figure 3 shows another potential change to model hyperparameters and emphasises that the model hyperparameters are not limited to the structure of the model as shown in the schematic visualisation. In this example, the time window 14 of historical data in the data streams 16 from the sensors 12 is widened. This is another example of a model hyperparameter that can be changed. The changes to the models are determined by the evolutionary algorithm described below.

In the prior art, changes to model hyperparameters, for example to take account of a faulty sensor or just to refine the model based on changes to the traffic network or the behaviour of traffic in that network, are made either manually or not at all.

Figure 4 is a flowchart of an evolutionary algorithm used for generating models. The model generation as such begins at step 20. Where no population already exists, a population of models is created randomly. Each model in the population will be a different model, i.e. it will be have a different set of hyperparameters. Once the models are created, they are trained and tested using training, validation and test data. Once the models are trained both hyperparameters and parameters are determined for each model, and each model is associated with a score as to its predictive accuracy, based on the testing.

Step 22 is a filtering or “selection” stage. The population is reduced in size to save only the “best” models, for example the top 30% of models based on the scores. Additionally, a random selection, for example a random sample of 10%, of the remaining non-best models are saved. The remainder of the models are discarded. Different percentages of the ‘best’ and ‘non-best’ models may be saved in different embodiments.

Step 24 is the “mutation” stage. Not all models have to be mutated. For example embodiments may cycle through each model and decide that a mutation is to occur, for example with 5% probability. If a mutation is to occur, typically one of the hyperparameters of the model will be changed. The purpose of mutation is to change the model slightly to make a different, but similar, ‘mutated’ model.

Step 26 is the “crossover” stage. In this stage the number of models in the population is increased again typically back up to the preset population size at which the random population was originally created. New ‘child’ models are made by selecting at random two ‘parent’ models, and setting each hyperparameter of the ‘child’ model at random either equal to that hyperparameter in the one parent or equal to that hyperparameter in the other parent.

Once the population is back up to its full size, any models which have not been trained and tested (i.e. any models which have either been mutated or are new ‘child’ models) will be trained and tested, and a score associated with them.

In step 28, a decision has to be made as to whether to continue the evolutionary process. This is made based on two factors:

• If the number of hyper-epochs (i.e. the number of iterations of steps 22, 24, 26, 28) has reached a predetermined maximum number then the evolutionary process will finish and the population will be ‘final’;

• If ‘hyperconvergence’ is detected then the evolutionary process will finish and the population will be ‘final’;

• Otherwise, return to step 22.

Hyperconvergence is a state where the overall quality of the population of models is deteriorating, or is no longer significantly improving. The overall quality of the population may be measured based on the scores from training and testing. However, it is found in some embodiments to be preferable to use a different set of testing data to score for the detection of hyperconvergence, than the testing data used to score each individual model during the selection / filtering stage 22.

The evolutionary process terminates at step 30 in a state where there is a final population of scored models. One of these models (the one with the top score) is the best model, and this model may be sent to the prediction module to replace the currently running model. Preferably, a final check is made to ensure that the new best model is better than the currently running model.

In some embodiments, the best model or a number of the best models may be included in the starting population the next time the evolutionary process is run, rather than starting with a completely random new population.

The evolutionary process in its search for a new best model will train and test a potentially very large number of models. This is a process which has considerable memory requirements and typically in implementations garbage collection will be incomplete. Over time this means that resources, in particular memory, will not be completely released for re-use and the process may eventually run out of memory. This memory can typically be recovered by terminating and restarting the process. For that reason, each time a model has been trained and tested the state of the evolutionary process (i.e. all the models in the population and associated scores, and other state variables) will be saved to a ‘checkpoint’ on disk. If the process should crash then it can be restarted and recovered from the checkpoint (step 19 in Figure 4). As well as allowing recovery from crashes, in particular caused by memory leaks, this also allows new versions of the software to be deployed without interrupting and restarting the model generation process. Note that hyperconvergence in some embodiments may take for example a few weeks to occur.

Models are parameterised and serialised for storage. It is therefore unnecessary to hold more than one model in memory at once. The model is also sent in its parameterised and serialised form from the evolutionary model generation module to the prediction module when the model needs to be ‘activated’. Parameterising and serialising the model involves saving the following components:

• The IDs of the data streams included in the model, and the ordering of each data stream;

• The parameters used in standardising each of the input data features - i.e. to transform each input feature to the zero-mean and unit variance expected by the model;

• The model’s architecture;

• The model’s hyperparameters;

• The model’s parameters - i.e. everything about the model which is determined by training the model;

• Any other configuration used by the evolutionary algorithm.

The above is a very general list. Note that throughout this document “hyperparameters” is used to mean anything about a model which is determined by the evolutionary process. In many embodiments this will include aspects of the data streams and ordering, aspects of the model architecture, and may include other items on the above list. In most implementations, not everything will be typically evolvable as hyperparameters. In a particular implementation, the parts which form a complete definition of a model therefore fall into three broad mutually exclusive categories:

• Model parameters - everything which is determined by training the model;

• Model hyperparameters - everything which is determined by the evolutionary process;

• Everything else - which is fixed and therefore the same for every single model generated and used in a particular embodiment.

In the third category, some embodiments may allow some of the “fixed” properties to be manually changed - i.e. they form part of the configuration of a more general system. However, only the parameters and hyperparameters are changed automatically by the model generation module. Configurable properties of the evolutionary model generation module may include:

• Population size;

• Percentage of “top” I “best” models to keep in first filtering step;

• Probability of re-including a model which was excluded at the first filtering step (i.e. the probability of an individual “non-best” model being retained in the population), or the proportion of “non-best” models which are retained.

• Probability of a mutation taking place, per model;

• Maximum number of hyper-epochs;

• End condition for identifying hyper-convergence;

• Early stopping criteria for model training (i.e. end condition for identifying overfitting during training);

• For each model hyperparameter:

o If the value is a number, the minimum and maximum allowable values;

o If the value is a string, a list of the allowable values;

o For the ‘widths’ hyperparameter, the maximum numbers of layers and the maximum width of each layer;

o Any other required constraints for other hyperparameters with complex data structures;

• The loss function to use for the hyper-optimisation - i.e. scoring models to identify hyperconvergence;

• The ratio between training, testing, and validation datasets - noting that data may be split into sets A, B and C and that B and C may be validation and testing sets respectively during model-level training, and testing and validation sets respectively during hyper-optimisation;

• The future timesteps to be predicted;

• Which datastreams are deemed useful to predict;

• The type(s) of neural network architecture to include.

Some of the items in the above list in some embodiments may be evolvable hyperparameters.

The implementation of the prediction module and evolutionary model generation module is typically as two programs running simultaneously on the same computer, a database shared by the two programs, and various configuration files used by the programs. Both programs run continuously, iteratively making predictions and producing new models.

It will be understood that in some embodiments prediction and/or model generation could take place on different machines, and some aspects of the evolutionary model generation process in particular could be readily parallelised on a cluster of computers.

The sensor network is arranged so that data from the sensors arrives in the database and becomes available to both programs as quickly as possible. Preferably the sensors communicate wirelessly over cellular networks. Due to network conditions it is likely that data will sometimes arrive ‘out of order’, i.e. sooner from some sensors than from others. The data cleaning and filtering processes described allow for this.

The system of the invention is found to make better predictions of future traffic states than state-of-the-art systems. In particular, it has good predictive accuracy and can respond to transient and long-term changes in the road system such as roadworks, new roads, new signals, new lanes, etc. It can also respond to events such as collisions, breakdowns, and congestion. It is resilient to unreliable sensor networks and communications systems.

It will be understood that variations on the embodiments described will be apparent to the skilled person. The invention is defined in the claims.

Claims

1. A traffic management system for a road network, the traffic management system comprising:

a traffic prediction system; and active traffic management means, the active traffic management means including outputs for controlling the vehicles and/or other users of the road network, the sensors providing inputs to the traffic prediction system, the traffic prediction system providing estimates of a future state of the road network, and the active traffic management means using outputs to control the vehicles and/or other users of the road network in response to the estimates of future state, characterised in that the traffic prediction system includes a prediction module adapted to apply a machine learning model to the input data from the sensors to produce predictions, and a stochastic model generation module adapted to generate, train and test machine learning models using the input data from the sensors, the model generation module sending an updated model to the prediction module whenever an updated model is available, and the prediction module replacing any previous model with the updated model when it is received from the model generation module.

2. A traffic management system as claimed in claim 1, in which the machine learning models are neural-network models.

3. A traffic management system as claimed in claim 2, in which the neural-network models are recurrent neural-network models.

4. A traffic management system as claimed in claim 3, in which the recurrent neural-network models include long short-term memory (LSTM) cells.

5. A traffic management system as claimed in any of the preceding claims, the prediction module being adapted to carry out a step of selecting input data based on a hyperparameter forming part of the definition of the machine learning model provided by the model generation module.

6. A traffic management system as claimed in any of the preceding claims, the prediction module being adapted to carry out a step of cleaning input data, the cleaning step including identifying and filling any missing values in input data.

7. A traffic management system as claimed in claim 6, in which filled-in values of missing input data are marked as estimates, and in which the estimates are periodically revised as more data becomes available.

8. A traffic management system as claimed in claim 6 or claim 7, in which missing values in input data are filled by a historical median value and/or by interpolation.

9. A traffic management system as claimed in any of the preceding claims, in which the prediction module is adapted to carry out a step of preparing data by creating a 3D tensor by lagging the data backwards through time by a number of lag steps, in the 3D tensor each column representing a data stream, each row representing a particular timestep and also (along the third axis) one or more previous timesteps.

10. A traffic management system as claimed in any of the preceding claims, in which the prediction module is adapted to carry out a step of preparing data by augmenting the data with at least one collateral data stream.

11. A traffic management system as claimed in claim 10, in which the collateral data stream is related to the weather.

12. A traffic management system as claimed in any of the preceding claims, in which the model generation module is adapted to carry out a step of filtering available data streams to select a subset of data streams for possible inclusion in models.

13. A traffic management system as claimed in claim 12, in which the step of filtering data streams takes into account one or more of the following criteria:

• usefulness scores associated with data streams;

• the amount of historical data available in each data stream;

• within the data available for each data stream, the quality and completeness of the data over a first threshold; and • within a more recent timeframe, the quality and completeness of the data.

14. A traffic management system as claimed in any of the preceding claims, in which the stochastic model generation module is an evolutionary model generation module.

15. A traffic management system as claimed in claim 14, in which the evolutionary model generation module is adapted to carry out the steps of:

• generating a population of models of a predetermined population size, each model in the population having a set of hyperparameters defining the model;

• for each model in the population, allocating random values to the model’s hyperparameters;

• training each model using training data and validation data, and testing each model using test data to assign a fitness score to each model;

• selecting a subset of the population as best models based on the fitness score;

• applying a mutation operation to at least some of the best models to change at least one hyperparameter of each mutated model;

• applying a crossover operation to pairs of the best models to create new child models, the child models being based on a combination of the hyperparameters in the pair of models;

• testing whether an end condition is met, and if not, then returning to the training step.

16. A traffic management system as claimed in claim 15, in which the end condition includes a test for hyperconvergence, that is a state where an overall population fitness of the population of models is deteriorating or is not improving over a threshold.

17. A traffic management system as claimed in claim 15 or claim 16, in which the hyperparameters determined by the evolutionary model generation module include at least one hyperparameter relating to the structure of the model.

18. A traffic management system as claimed in claim 17, in which the hyperparameters determined by the evolutionary model generation module include at least a hyperparameter defining the number of layers in a neural network model.

19. A traffic management system as claimed in claim 17 or claim 18, in which the hyperparameters determined by the evolutionary model generation module include at least a hyperparameter defining the number of nodes in a layer of a neural network model.

20. A traffic management system as claimed in claim 16, in which the test data used to determine population fitness in the test for hyperconvergence is different from the test data used to assign a fitness score to each model.

21. A traffic management system as claimed in any of the preceding claims, in which the state of the model generation module is saved to secondary storage for possible recovery, after each time a model is trained and tested.

22. A method of actively managing a road network, the road network having a plurality of sensors for monitoring vehicles and/or other users of the road network, and active traffic management means, the active traffic management means including outputs for controlling the vehicles and/or other users of the road network, the method comprising the steps of:

continuously reading input from the network of sensors and storing data streams from the sensors;

repeatedly applying a prediction model to the data streams to predict a future state of the road network;

using outputs of the active traffic management means to control the vehicles and/or other road users in response to the predicted future states;

repeatedly applying a stochastic algorithm to generate train and test prediction models, and to replace the prediction model used to predict a future state of the road network with a new model generated by the stochastic algorithm, at least the step of repeatedly applying a prediction model and the step of repeatedly applying a stochastic algorithm being carried out substantially in parallel with each other.

23. A method of actively managing a road network as claimed in claim 22, in which the prediction models are neural-network models.

24. A method of actively managing a road network as claimed in claim 23, in which the neural-network models are recurrent neural-network models.

25. A method of actively managing a road network as claimed in claim 24, in which the recurrent neural-network models include long short-term memory cells.

26. A method of actively managing a road network as claimed in any of claims 22 to 25, in which applying the prediction model includes selecting input data based on a hyperparameter forming part of the model definition generated by the stochastic algorithm.

27. A method of actively managing a road network as claimed in any of claims 22 to 26 in which the applying the prediction model includes cleaning input data at least by identifying and filling missing values in input data.

28. A method of actively managing a road network as claimed in claim 27, in which missing values in input data are filled by a historical median value and/or by interpolation.

29. A method of actively managing a road network as claimed in any of claims 22 to 28, in which applying a prediction model includes preparing data by creating a 3D tensor by lagging input data backwards through time by a number of lag steps, in the 3D tensor each column representing a data stream, each row representing a particular timestep and also (along the third axis) one or more previous timesteps.

30. A method of actively managing a road network as claimed in any of claims 22 to 27, the step of applying a stochastic algorithm to generate train and test neural-network prediction models including filtering available input data streams for possible inclusion in generated models.

31 .A method of actively managing a road network as claimed in claim 30, in which filtering the available input data streams takes into account one or more of the following criteria:

• usefulness scores associated with data streams;

• the amount of historical data available in each data stream;

• within the data available for each data stream, the quality and completeness of data over a first threshold; and • within a more recent timeframe, the quality and completeness of the data.

32. A method of actively managing a road network as claimed in any of claims 22 to 31, in which the stochastic algorithm is an evolutionary algorithm.

33. A method of actively managing a road network as claimed in claim 32, in which applying an evolutionary algorithm to generate train and test models includes:

• training each model using training data and validation data to define parameters of the model, and testing each model using test data to assign a fitness score to each model;

34. A method of actively managing a road network as claimed in claim 33, in which the end condition includes a test for hyperconvergence, that is a state where an overall population fitness of the population of models is deteriorating or is not improving over a threshold.

35. A method of actively managing a road network as claimed in claim 34, in which the test data used to determine population fitness in the test for hyperconvergence is different from the test data used to assign a fitness score to each model.

36. A non-transient computer readable medium such as a storage medium, containing instructions which when executed on at least one computer cause the computer to carry out the method of any of claims 22 to 35.