Disclosure of Invention
In order to more clearly illustrate the contents of the present invention, first, the terms of art involved are explained as follows:
floating car: also known as probe cars. The vehicle-mounted positioning device is mounted on the bus and the taxi which run on the urban road.
GNSS: global Navigation Satellite System (Global Navigation Satellite System). Including GPS, Glonass, Galileo, and the beidou satellite navigation system.
Space-time subregion: the fragment area is divided according to two dimensions of time and space, and reflects the situation in a certain space range within a period of time.
Historical track data: the historical trajectory data is trajectory data that is accumulated over a long period of time and stored in a database. The historical track data is dynamically changed data and needs to be updated in time and reprocessed and analyzed periodically to ensure the accuracy of extracting the historical traffic characteristics. The data for each spatio-temporal subregion can be processed in parallel to improve efficiency. Which may be referred to herein simply as historical data.
Real-time trajectory data: the real-time trajectory data is a set of trajectory data in a time segment closest to the current time. Which may be referred to herein simply as real-time data.
Traffic situation: the general term of the comprehensive condition of traffic operation in a certain time and a certain space.
And (3) traffic abnormity: traffic flow disorder caused by traffic accidents, vehicle breakdown, truck dropping, damage or failure of road traffic facilities and other events.
Severity of traffic abnormalities: that is, the severity of the traffic flow disorder is the difference between the normal traffic flow and the traffic flow characteristics after the occurrence of the traffic abnormality.
Traffic anomaly index: a measure of the severity of a traffic abnormality. The range is 0-10, and the larger the numerical value is, the more serious the traffic abnormality is.
The traffic environment is as follows: the sum of all external influences and forces acting on road traffic participants. Including road conditions, traffic facilities, terrain and topography, weather conditions, and other traffic activities of traffic participants.
Map matching: a process of associating geographic coordinates with a city road network.
Peak hour flow: the maximum value of hour traffic flow in a day of a certain urban road section.
Finite mixture model: a mathematical method for modeling complex densities with simple densities. The finite mixture model with the variable set as y and the number of components as K can be expressed as:
response variable: variables that change in accordance with an independent variable are also referred to as dependent variables.
Bayesian information criterion: under incomplete information, the method estimates the partially unknown state by subjective probability and then corrects the occurrence probability by a Bayes formula to obtain an evaluation index of the reliability of the result. The calculation method comprises the following steps:
BIC=-2ln L+k·ln n
in the formula, L is the maximum value of the likelihood function, k is the number of unknown parameters, and n is the sample size.
Likelihood function: the likelihood function is a function of the parameters of the statistical model. The probability that the likelihood function L (θ | X) for a parameter θ is (numerically) equal to the variable X after the given parameter θ, given an output X: l (θ | X) ═ P (X ═ X | θ).
Kullback-Leibler divergence: a measure of the difference between the two probability distributions P and Q.
Jensen-Shannon divergence: is a symmetric form of the Kullback-Leibler divergence.
K-Medoids algorithm: a clustering algorithm selects, for each iteration, a point from the current class whose sum of distances to all other points (in the current class) is the smallest as a new center point.
The invention aims to establish a scheme for identifying road traffic abnormal events by combining historical GNSS positioning data and real-time GNSS positioning data and traffic environment information based on a floating car track recording system. In order to achieve the purpose, the invention provides the following technical scheme:
the implementation premise of the invention is as follows: floating cars (taxis, buses, etc.) carrying GNSS trajectory recorders; the data center has large-scale storage, calculation and real-time task processing capacity.
The application range of the invention is as follows: urban roads (including ground roads and elevated roads) through which the floating vehicles pass.
The implementation steps of the invention comprise:
1) and establishing a space-time subregion. The method comprises the steps of dividing a day into a plurality of time segments, and dividing an implementation area of urban road traffic abnormity detection into a plurality of space segments.
2) And (4) preprocessing data. The GNSS positioning data is subjected to data cleaning, data integration, data conversion and data reduction, and the structuralization degree of the data is improved.
3) And (4) fast map matching. And combining the urban road network data, projecting the GNSS positioning point to an urban map through a map matching algorithm, establishing a matching relation between the positioning point and a road section, and correcting errors caused by positioning drift.
4) And (6) sampling data. Different sampling methods may be employed based on different computational power and accuracy requirements.
5) Analyzing historical track data and extracting characteristics. And establishing a traffic characteristic model by using the historical floating car track data. And for each space-time sub-area, describing traffic characteristics by using probability distribution of travel speed, establishing a traffic characteristic model and carrying out parameter estimation.
6) And analyzing real-time track data and extracting characteristics. And (4) mastering the change dynamics of traffic characteristics by utilizing real-time floating car track data. And describing the current traffic characteristics by using the travel speed probability distribution of the current time-space subregion, establishing a model and carrying out parameter estimation.
7) And (4) detecting the abnormality. The difference between the historical traffic characteristics and the real-time traffic characteristics was measured by Jensen-Shannon divergence.
8) And (4) quantitatively characterizing the severity of the abnormality. And calculating and issuing a traffic condition abnormity index.
The step 1) may specifically adopt the following method:
11) equidistant space-time division method. Determining the segment scale of a time dimension, wherein the span of the time segment is a fixed value, and usually 30min is taken as a time segment; the segment size of the spatial dimension is determined, the span of the spatial segment is a fixed value, and a spatial grid of 200m × 200m is taken as a spatial segment.
12) Non-equidistant spatiotemporal division. For the road network density of more than 2km/km2Or in the urban central area with the peak hour flow rate more than 1000 vehicles/hour, taking a time segment of 30min and a space segment of 200m multiplied by 200m, and regarding the road network density less than 2km/km2Or peak hourAnd taking a time segment of 30min and a space segment of 400m multiplied by 400m in the suburban area with the flow rate less than 1000 vehicles/hour.
The step 3) specifically comprises the following steps:
31) the spatial region to be processed is divided into grids of a certain size, and the range of each grid region can be represented as As={(xs,ys)|xs∈[xr,xr+1),ys∈[yr,yr+1)};
32) Judging a grid area where the positioning point is located, and searching a road section where the positioning point is located by using the distance and the azimuth angle, wherein the matching scheme comprises the following steps:
321) the single-point matching scheme comprises the following steps:
searching a road section closest to the point A, and when the difference value between the driving direction angle meeting the point A and the direction angle of the road section ij is smaller than a threshold value, namely, meeting the condition of thetaA-θij|<δθCompleting matching; if not satisfy | thetaA-θij|<δθAnd deleting the road sections ij in the search space, and continuing to search other road sections until the condition is met. The matching method is shown in fig. 3.
322) Point sequence matching scheme:
this scheme is applicable to high frequency floating car data. Representation of floating car GNSS data acquisition frequency as f
0=1/t
0A point P (t) adjacent to A in time
A-t
0),P(t
A+t
0) 1-neighbor, P (t), defined as A
A-2t
0),P(t
A+2t
0) Defined as 2-neighbors of A, and so on, then P (t)
A-kt
0),P(t
A+kt
0) Defined as k-neighbors of A. At f
0If the frequency is less than 1Hz, k is 1 or 2. Taking the road section ij with the minimum distance between the distance A and the k-adjacent point of the distance A, and calculating the average value of the driving direction angles of the k-adjacent points of the distance A and the distance A
If it satisfies
Completing matching; otherwise, searching other road sections until the condition is met.
33) And calculating the projection coordinate of the GNSS positioning point on the road section by using a linear equation of the road section (approximately splitting into a straight line if the road section is a curve road section), thereby reducing the error caused by GNSS positioning drift. The specific method adopts a GNSS positioning point linear projection method as follows:
determining a linear equation of the road section ij (if the road section is a curve, dividing the road section into a plurality of linear sections):
y-yi=k(x-xi)
the projection line equation is:
solving the projection coordinate P as:
after the map matching process, matching the positioning points to the space-time sub-area by combining the timestamp data of the coordinates of the positioning points.
The step 4) may specifically adopt the following method:
41) the scheme is as a whole. The whole vehicle speed data of each sub-floating vehicle in a space-time subregion form a whole. The implementation method is that the travel speed of each vehicle in a space-time subregion xi is calculated:
wherein d is
1, 2...d
n-1,nIs the distance between the 1 st and the 2 nd GNSS positioning points in the space-time sub-area xi, the distance between the n-1 st and the n th GNSS positioning point, and t
1...t
nThe timestamp of the nth GNSS positioning point is the 1 st within the space-time sub-region xi. Data in each space-time subregion are not screened to form a set V
ξFor subsequent processing.
42) A time-smoothed sampling scheme. Specifying the time segment length and the upper limit of the number of data of the same time segment; and searching the speed data in each time segment in a time-space subregion, and if the number of the speed data in the time segment exceeds the upper limit, randomly taking the data with the upper limit number and adding the data into the data sample to be processed. The implementation method is that the travel speed of each vehicle in a space-time subregion xi is calculated:
wherein d1
,2...d
n-1,nIs the distance between the 1 st and the 2 nd GNSS positioning points in the space-time sub-area xi, the distance between the n-1 st and the n th GNSS positioning point, and t
1...t
nThe timestamp of the nth GNSS positioning point is the 1 st within the space-time sub-region xi. Specifying a time slice length t
pUpper limit of number of data pieces p of same time segment
max(ii) a Searching the speed data in the ith time segment of a time in a space-time subregion, if the number of the speed data in the time segment exceeds the upper limit p
maxRandom access of p
maxStrip data Add V
ξ。
The step 5) may specifically adopt the following method:
51) and (3) a simple historical track data fusion method. And taking the historical data under the condition of no traffic abnormality as a whole to establish a traffic characteristic model and estimate parameters. The method utilizes a finite mixture model to establish a traffic characteristic model and carries out parameter estimation. One of three schemes may be employed:
511) gaussian mixture model with fixed composition
The scheme adopts a mixed Gaussian model with fixed component quantity K to describe the probability distribution of the vehicle speed. The component amounts are manually specified according to the distribution pattern of the vehicle speed in a typical case. In order to ensure the reliability of the probability distribution, the number of components K cannot be too small. Generally, K is 4 to 6.
512) Gaussian mixture model with variable component number
The scheme adopts a model evaluation-based method to select proper component quantity, and the method comprises the following steps:
determining the possible maximum component quantity K, and respectively carrying out parameter estimation on a mixed Gaussian model with n being 1, 2, … and K components; for the K models, the best model is determined by Bayesian Information Criterion (BIC). The number of largest components is generally chosen according to the accuracy requirement, but it must be noted that the larger the number of components, the slower the convergence of the maximization algorithm is expected to be. The maximum ingredient amount selected here is K5, i.e. it needs to be calculated:
there are 5 mixed models. Meanwhile, BIC was calculated for 5 models, which is defined as:
BIC=-2ln L+k·ln n
in the formula, L is the maximum likelihood function value, k is the number of parameters in the model, and n is the total amount of data.
And then, selecting a mixed model with the minimum BIC, and recording parameter vectors eta, mu and sigma of the mixed model as the characteristic record of the space-time sub-area. The density profile of the mixed model is shown in fig. 6.
513) Finite mixed model with variable component quantity and distribution type
The scheme adopts a model-based evaluation method similar to 512), but the distribution form of the subcomponents and the quantity of the subcomponents are variable, and the method comprises the following steps:
selecting M probability distribution models as the distribution types of the subcomponents, including but not limited to: normal distribution, gamma distribution, weibull distribution. When a normal distribution is used, the sub-distribution function takes:
when using a gamma distribution, the sub-distribution function takes:
When using a weibull distribution, the sub-distribution function takes:
assuming that the distribution types of all the subcomponents of the mixture model are the same, the possible maximum number of components K is determined. And selecting the distribution type of the M seed components and the quantity of the K components to form M.K combinations, respectively calculating the BIC value, and taking the model with the minimum BIC as the optimal model.
52) And (4) classifying historical track data by context. According to the air temperature, the precipitation, the visibility and the traffic control measures, the historical data under the condition of no traffic abnormality is divided into different categories, and a model is respectively established and parameter estimation is carried out. The implementation method comprises the following steps:
according to the difference of air temperature, precipitation, visibility and traffic control measures, traffic environments are divided into 5-8 categories, and historical data are classified into the categories according to the difference of the traffic environments corresponding to the historical data. For each category, the processing as described in 51) is performed, so that a mapping relationship R (E → T) is established, where E is the traffic environment and T is the traffic situation.
53) Historical data clustering. And for historical data, obtaining difference quantitative representations of different space regions through comparison between every two space-time sub-regions, and clustering by using the quantized differences. And performing multiple Logit regressions by taking the air temperature, the precipitation, the visibility and the traffic control measures as characteristic factors, and establishing a mapping relation between the traffic environment and the category. The implementation scheme is shown in figure 4. The implementation steps are as follows:
531) establishing a traffic characteristic model and performing parameter estimation according to the method 51).
532) Writing out probability density function p of travel vehicle speed distribution corresponding to different dates of space-time subareas according to the previous finite mixture model parameter estimation resulti(x) The parameters are given by a Gaussian mixture model as an example:
533) calculating Jensen-Shannon divergence d between every two distributionsij:
Where P, Q are two different probability distributions,
d is Kullback-Leibler divergence:
in the case of the finite mixture model, the value cannot be expressed explicitly, but can be approximated by a monte carlo sampling method, which is:
534) writing the divergence between every two distributions into a distance matrix:
the matrix satisfies dij=dji,dij=0(i=j)。
535) And taking the distance matrix as the input of a K-Medoids algorithm to obtain a clustering result, and establishing indexes for categories.
536) And (4) performing multiple Logit regression by taking the category index as a response variable and taking the traffic environment data (including air temperature, precipitation, visibility and the like) as independent variables to obtain a mapping relation R (E → T) between the traffic environment E and the traffic situation category T.
537) And aggregating the data of the same category, reestablishing a mixed model by utilizing the new aggregated data set, and performing parameter estimation to obtain a final historical traffic characteristic data set.
The step 6) may specifically adopt the following method:
61) simple real-time data processing. The method is carried out simultaneously with 51). And carrying out model establishment and parameter estimation on the real-time traffic data to obtain a characteristic function of the current traffic condition. The implementation steps of the method are the same as those of 51), except that the adopted data are real-time traffic data.
62) And (4) a classification treatment method. The method is carried out simultaneously with 52) or 53). And acquiring a characteristic function of the traffic condition, acquiring information such as current air temperature, precipitation, visibility, traffic control measures and the like, and judging the category of the current traffic condition. The implementation scheme is shown in figure 5. The implementation steps are as follows:
621) calculating travel speed in the space-time subarea to form a real-time travel speed overall Vξ,rt;
622) Establishing travel vehicle speed probability distribution model
And estimating parameters;
623) and taking the current traffic environment data (including air temperature, precipitation, visibility and the like) as input parameters, and obtaining the category T of the current traffic situation by utilizing a mapping relation R (E → T).
The step 7) specifically comprises the following steps:
71) when the step 62) is adopted, positioning historical traffic characteristic data under the category according to the category T to which the current traffic situation belongs;
72) description parameter eta according to current traffic characteristicsrt、μrt、σrtAnd historical traffic characteristicsThe characterizing descriptive parameters η, μ, σ calculate the difference between the two velocity distributions: diff | (η)rt,μrt,σrt),(η,μ,σ)|=JSD(PrtP). When the historical traffic characteristics are similar to the real-time traffic characteristics (namely the historical travel vehicle speed distribution and the real-time travel vehicle speed distribution), a smaller Jensen-Shannon divergence value is obtained, namely the difference between the Jensen-Shannon divergence value and the Shannon divergence value is smaller; when the historical traffic characteristics and the real-time traffic characteristics are greatly different, a larger Jensen-Shannon divergence value is obtained, that is, the difference between the two is larger, that is, the probability of abnormality is larger, see fig. 7.
The step 8) specifically comprises the following steps:
81) standardizing the speed distribution difference of each space-time subregion to a normalized value a of 0-1ξ:
82) Calculating the traffic abnormality index A of each space-time subregionξ=aξ×10。
Compared with similar technologies in the same field, the invention has the following advantages:
(1) the method fully utilizes the existing floating car operation data (GNSS track data), detects the change of the traffic state through historical traffic feature extraction and real-time traffic situation analysis, and can realize real-time, low-cost and intelligent detection of the urban road traffic abnormal event;
(2) the probability distribution of the travel speed is used as the description of the traffic characteristics, the reflected characteristics are more comprehensive, the one-sidedness and instability of the traffic characteristics represented by a single index are avoided, and the detection reliability is higher;
(3) aiming at the characteristic that the traffic characteristics are influenced by the traffic environment (such as weather conditions), a clustering-multinomial Logit regression joint algorithm is introduced, and the mapping relation between the traffic environment characteristics and the traffic situation categories is established;
(4) through the inspection of actual data, the urban road traffic anomaly detection technology based on floating car data provided by the invention can realize the detection of an abnormal event with higher accuracy, the detection rate exceeds 90%, the false alarm rate is lower than 20%, a good detection effect is obtained, and the technology can be applied to urban traffic intelligent management and service.
Detailed description of the preferred embodiments
In order to more clearly and clearly express the objects, technical solutions and advantages of the present invention, specific embodiments of the present invention are described in detail below.
As shown in fig. 1, the overall system architecture of the present invention comprises: the vehicle-mounted GNSS track recorder is carried by the floating vehicle, and comprises a data center, a GNSS satellite and a communication system. The GNSS here includes any similar navigation satellite positioning system such as GPS, GLONASS, GALILEO, Beidou, IRNSS, QZSS, etc. The GNSS track recorder carried by the floating vehicles such as taxies, buses and the like records the position information of the vehicles at each time point in the driving process at a certain sampling frequency f (generally, f is required to be more than 0.1Hz), and transmits the position information to the data center in real time through a GPRS mobile communication network (wireless network communication technologies such as WCDMA, TD-LTE and the like can also be adopted, but the cost is correspondingly improved). The data center establishes a historical road traffic characteristic database through data preprocessing and data fusion and a specific algorithm; establishing a real-time traffic characteristic database for recently received real-time data; and judging whether the current traffic characteristics are abnormal or not through the mapping relation between the historical database and the real-time database, and performing visual display through the processing terminal to generate a traffic abnormal event report.
The general flow of the scheme is shown in fig. 2, and comprises the steps of collecting and storing GNSS track data, establishing a space-time subregion, extracting historical traffic characteristics, extracting real-time traffic characteristics, identifying anomalies and the like. The acquisition and storage of GNSS trajectory data is a data basis of the whole scheme, and due to the huge data magnitude, a distributed storage scheme is adopted, and the content of the invention is not the content of the existing mature technology for distributed storage. The method is characterized in that space-time sub-areas are established, the basic assumption is that the same traffic characteristics exist in a specific area and a specific time period, and the assumption is universally applicable after long-term observation. The historical traffic characteristic extraction method is characterized by utilizing GNSS track data to calculate travel speed, utilizing a large amount of travel speed data of the same time-space subregion to establish a probability distribution model of the speed, carrying out parameter estimation and representing traffic characteristics by a small amount of parameters. The principle of the real-time traffic feature extraction is that speed data in the current time period is processed and analyzed, and a current vehicle speed probability distribution model is built. The difference measure index is adopted to judge the change degree of the real-time characteristic compared with the historical characteristic, and whether the traffic abnormal event occurs is determined according to whether the change degree reaches the threshold value.
According to a combination of the implementation methods described in the summary of the invention, the examples are given below.
Example one
Step 11, determining the segment scale of the time dimension by adopting an equidistant space-time division method, wherein the time segment span is a fixed value, and generally taking 30min as a time segment; the segment size of the spatial dimension is determined, the span of the spatial segment is a fixed value, and a spatial grid of 200m × 200m is taken as a spatial segment.
And step 12, preprocessing data, and performing data cleaning, data integration, data conversion and data reduction on the GNSS positioning data to improve the structuralization degree of the data.
Step 13, dividing the spatial region to be processed into grids of a certain size, where the range of each grid region can be represented as as={(xs,ys)|xs∈[xr,xr+1),ys∈[yr,yr+1) }; judging a grid area where the positioning point is located, and searching a road section where the positioning point is located by using the distance and the azimuth angle; searching a road section closest to the point A, and when the difference value between the driving direction angle meeting the point A and the direction angle of the road section ij is smaller than a threshold value, namely, meeting the condition of thetaA-θij|<δθCompleting matching; if not satisfy | thetaA-θij|<δθDeleting the road sections ij in the search space, and continuing to search other road sections until the condition is met; the method comprises the following steps of calculating the projection coordinates of the GNSS positioning points on the road section by using a linear equation of the road section (approximately splitting into a straight line if the road section is a curve road section), and reducing errors caused by GNSS positioning drift, wherein the specific method comprises the following steps:
determining a linear equation of the road section ij (if the road section is a curve, dividing the road section into a plurality of linear sections):
y-yi=k(x-xi)
the projection line equation is:
solving the projection coordinate P as:
after the map matching process, matching the positioning points to the space-time sub-area by combining the timestamp data of the coordinates of the positioning points.
And step 14, forming a whole by all the driving speed data of each sub-floating vehicle in a space-time sub-area. Calculating the travel speed of each vehicle in a space-time sub-area xi:
wherein d is
1,2...d
n-1,nIs the distance between the 1 st and the 2 nd GNSS positioning points in the space-time sub-area xi, the distance between the n-1 st and the n th GNSS positioning point, and t
1...t
nThe timestamp of the nth GNSS positioning point is the 1 st within the space-time sub-region xi. Data in each space-time subregion are not screened to form a set V
ξFor subsequent processing.
And step 15, taking the historical data under the condition of no traffic abnormality as a whole to establish a traffic characteristic model and estimate parameters. The method utilizes a finite mixture model to establish a traffic characteristic model and carries out parameter estimation. Taking the maximum component number K as 5, and respectively carrying out parameter estimation on a Gaussian mixture model with n as 1, 2, … and K components; for the K models, the best model is determined by Bayesian Information Criterion (BIC). And (3) calculating:
there are 5 mixed models. Meanwhile, the BIC of 5 models was calculated:
BIC=-2ln L+k·ln n
in the formula, L is the maximum likelihood function value, k is the number of parameters in the model, and n is the total amount of data.
And then, selecting a mixed model with the minimum BIC, and recording parameter vectors eta, mu and sigma of the mixed model as the characteristic record of the space-time sub-area.
Step 16, carrying out model establishment and parameter estimation on the real-time traffic data to obtain a characteristic function of the current traffic condition, wherein the method has the same steps as the step five, and recording a parameter vector etart、μrt、σrt。
Step 17, according to the description parameter eta of the current traffic characteristicsrt、μrt、σrtAnd the description parameters eta, mu and sigma of the historical traffic characteristics calculate the difference between the two speed distributions: diff [ (eta) ]rt,μrt,σrt),(η,μ,σ)]=JSD(Prt||P)。
Step 18, standardizing the speed distribution difference of each space-time subregion to a normalized value a of 0-1ξi:
Calculating the traffic abnormality index A of each space-time subregionξ=aξ×10。
Example two
Step 21, determining the segment scale of the time dimension by adopting an equidistant space-time division method, wherein the time segment span is a fixed value, and generally taking 30min as a time segment; the segment size of the spatial dimension is determined, the span of the spatial segment is a fixed value, and a spatial grid of 200m × 200m is taken as a spatial segment.
And step 22, preprocessing data, and performing data cleaning, data integration, data conversion and data reduction on the GNSS positioning data to improve the structuralization degree of the data.
Step 23, dividing the spatial region to be processed into grids of a certain size, where the range of each grid region can be represented as as={(xs,ys)|xs∈[xr,xr+1),ys∈[yr,yr+1) }; judging a grid area where the positioning point is located, and searching a road section where the positioning point is located by using the distance and the azimuth angle; searching a road section closest to the point A, and when the difference value between the driving direction angle meeting the point A and the direction angle of the road section ij is smaller than a threshold value, namely, meeting the condition of thetaA-θij|<δθCompleting matching; if not satisfy | thetaA-θij|<δθDeleting the road sections ij in the search space, and continuing to search other road sections until the condition is met; benefit toCalculating the projection coordinate of the GNSS positioning point on the road section by using a linear equation of the road section (approximately splitting into a straight line if the road section is a curve road section), and reducing the error caused by GNSS positioning drift, wherein the specific method comprises the following steps:
determining a linear equation of the road section ij (if the road section is a curve, dividing the road section into a plurality of linear sections):
y-yi=k(x-xi)
the projection line equation is:
solving the projection coordinate P as:
after the map matching process, matching the positioning points to the space-time sub-area by combining the timestamp data of the coordinates of the positioning points.
Step 24, calculating the travel speed of each vehicle in the space-time sub-area xi:
wherein d is
1, 2...d
n-1,nIs the distance between the 1 st and the 2 nd GNSS positioning points in the space-time sub-area xi, the distance between the n-1 st and the n th GNSS positioning point, and t
1...t
nThe timestamp of the nth GNSS positioning point is the 1 st within the space-time sub-region xi. Specifying a time slice length t
pUpper limit of number of data pieces p of same time segment
max(ii) a Searching the speed data in the ith time segment of a time in a space-time subregion, if the number of the speed data in the time segment exceeds the upper limit p
maxRandom access of p
maxStrip data Add V
ξ。。
And 25, taking the historical data under the condition of no traffic abnormality as a whole to establish a traffic characteristic model and estimate parameters. The method utilizes a finite mixture model to establish a traffic characteristic model and carries out parameter estimation. Taking the maximum component number K as 5, and respectively carrying out parameter estimation on a Gaussian mixture model with n as 1, 2, … and K components; for the K models, the best model is determined by Bayesian Information Criterion (BIC). And (3) calculating:
there are 5 mixed models. Meanwhile, the BIC of 5 models was calculated:
BIC=-2ln L+k·ln n
in the formula, L is the maximum likelihood function value, k is the number of parameters in the model, and n is the total amount of data.
And then, selecting a mixed model with the minimum BIC, and recording parameter vectors eta, mu and sigma of the mixed model as the characteristic record of the space-time sub-area.
According to the parameter estimation result, writing out the probability density function p of the travel vehicle speed distribution of the space-time subarea corresponding to different datesi(x):
Calculating Jensen-Shannon divergence d between every two distributionsij:
Where P, Q are two different probability distributions,
d is Kullback-Leibler divergence:
under the condition of adopting a finite mixture model, adopting a Monte Carlo sampling method to approximate calculation, wherein the calculation method comprises the following steps:
writing the divergence between every two distributions into a distance matrix:
the matrix satisfies dij=dji,dij=0(i=j)。
And taking the distance matrix as the input of a K-Medoids algorithm to obtain a clustering result, and establishing indexes for categories.
And (4) performing multiple Logit regression by taking the category index as a response variable and taking the traffic environment data (including air temperature, precipitation, visibility and the like) as independent variables to obtain a mapping relation R (E → T) between the traffic environment E and the traffic situation category T.
And aggregating the data of the same category, reestablishing a mixed model by utilizing the new aggregated data set, and performing parameter estimation to obtain a final historical traffic characteristic data set.
And step 26, acquiring a characteristic function of the traffic condition, acquiring information such as current air temperature, precipitation, visibility and traffic control measures, and judging the type of the current traffic condition.
Calculating travel speed in the space-time subarea to form a real-time travel speed overall Vξ,rt;
Establishing travel vehicle speed probability distribution model
And estimating parameters;
and taking the current traffic environment data (including air temperature, precipitation, visibility and the like) as input parameters, and obtaining the category T of the current traffic situation by utilizing a mapping relation R (E → T).
Step 27, positioning historical traffic characteristic data under the category according to the category T to which the current traffic situation belongs; description parameter eta according to current traffic characteristicsrt、μrt、σrtAnd the description parameters eta, mu and sigma of the historical traffic characteristics calculate the difference between the two speed distributions: diff [ (eta) ]rt,μrt,σrt),(η,μ,σ)]=JSD(Prt||P)。
Step 28, standardizing the speed distribution difference of each space-time subregion to a normalized value a of 0-1ξ:
Calculating the traffic abnormality index A of each space-time subregionξ=aξ×10。
EXAMPLE III
Step 31, adopting a non-equidistant space-time division method to obtain a road network with the density of more than 2km/km2Or in the urban central area with the peak hour flow rate more than 1000 vehicles/hour, taking a time segment of 30min and a space segment of 200m multiplied by 200m, and regarding the road network density less than 2km/km2Or suburban areas with peak hourly traffic less than 1000 vehicles/hour, and a time segment of 30min and a space segment of 400m × 400m are taken.
And step 32, preprocessing data, performing data cleaning, data integration, data conversion and data reduction on the GNSS positioning data, and improving the structuralization degree of the data.
Step 33, dividing the spatial region to be processed into meshes of a certain size, where the range of each mesh region can be represented as as={(xs,ys)|xs∈[xr,xr+1),ys∈[yr,yr+1)};
Representation of floating car GNSS data acquisition frequency as f
0=1/t
0Time of dayUpper point P (t) adjacent to A
A-t
0),P(t
A+t
0) 1-neighbor, P (t), defined as A
A-2t
0),P(t
A+2t
0) Defined as 2-neighbors of A, and so on, then P (t)
A-kt
0),P(t
A+kt
0) Defined as k-neighbors of A. At f
0If the frequency is less than 1Hz, k is 1 or 2. Taking the road section ij with the minimum distance between the distance A and the k-adjacent point of the distance A, and calculating the average value of the driving direction angles of the k-adjacent points of the distance A and the distance A
If it satisfies
Completing matching; otherwise, searching other road sections until the condition is met.
And calculating the projection coordinate of the GNSS positioning point on the road section by using a linear equation of the road section (approximately splitting into a straight line if the road section is a curve road section), thereby reducing the error caused by GNSS positioning drift. The specific method comprises the following steps:
determining a linear equation of the road section ij (if the road section is a curve, dividing the road section into a plurality of linear sections): y-yi=k(x-xi)
the projection line equation is:
solving the projection coordinate P as:
after the map matching process, matching the positioning points to the space-time sub-area by combining the timestamp data of the coordinates of the positioning points.
Step 34, calculating the travel speed of each vehicle in the space-time sub-area xi:
wherein d is
1, 2...d
n-1,nIs the distance between the 1 st and the 2 nd GNSS positioning points in the space-time sub-area xi, the distance between the n-1 st and the n th GNSS positioning point, and t
1...t
nThe timestamp of the nth GNSS positioning point is the 1 st within the space-time sub-region xi. Specifying a time slice length t
pUpper limit of number of data pieces p of same time segment
max(ii) a Searching the speed data in the ith time segment of a time in a space-time subregion, if the number of the speed data in the time segment exceeds the upper limit p
maxRandom access of p
maxStrip data Add V
ξ。。
And step 35, taking the historical data under the condition of no traffic abnormality as a whole to establish a traffic characteristic model and estimate parameters. The method utilizes a finite mixture model to establish a traffic characteristic model and carries out parameter estimation. Taking the maximum component number K as 5, and respectively carrying out parameter estimation on a Gaussian mixture model with n as 1, 2, … and K components; for the K models, the best model is determined by Bayesian Information Criterion (BIC). And (3) calculating:
there are 5 mixed models. Meanwhile, the BIC of 5 models was calculated:
BIC=-2ln L+k·ln n
in the formula, L is the maximum likelihood function value, k is the number of parameters in the model, and n is the total amount of data.
And then, selecting a mixed model with the minimum BIC, and recording parameter vectors eta, mu and sigma of the mixed model as the characteristic record of the space-time sub-area.
According to the parameter estimation result, writing out the probability density function p of the travel vehicle speed distribution of the space-time subarea corresponding to different datesi(x):
Calculating Jensen-Shannon divergence d between every two distributionsij:
Where P, Q are two different probability distributions,
d is Kullback-Leibler divergence:
under the condition of adopting a finite mixture model, adopting a Monte Carlo sampling method to approximate calculation, wherein the calculation method comprises the following steps:
writing the divergence between every two distributions into a distance matrix:
the matrix satisfies dij=dji,dij=0(i=j)。
And taking the distance matrix as the input of a K-Medoids algorithm to obtain a clustering result, and establishing indexes for categories.
And (4) performing multiple Logit regression by taking the category index as a response variable and taking the traffic environment data (including air temperature, precipitation, visibility and the like) as independent variables to obtain a mapping relation R (E → T) between the traffic environment E and the traffic situation category T.
And aggregating the data of the same category, reestablishing a mixed model by utilizing the new aggregated data set, and performing parameter estimation to obtain a final historical traffic characteristic data set.
And step 36, acquiring a characteristic function of the traffic condition, acquiring information such as current air temperature, precipitation, visibility and traffic control measures, and judging the category of the current traffic condition.
Calculating travel speed in the space-time subarea to form a real-time travel speed overall Vξ,rt;
Establishing travel vehicle speed probability distribution model
And estimating parameters;
and taking the current traffic environment data (including air temperature, precipitation, visibility and the like) as input parameters, and obtaining the category T of the current traffic situation by utilizing a mapping relation R (E → T).
Step 37, positioning historical traffic characteristic data under the category according to the category T to which the current traffic situation belongs; description parameter eta according to current traffic characteristicsrt、μrt、σrtAnd the description parameters eta, mu and sigma of the historical traffic characteristics calculate the difference between the two speed distributions: diff [ (eta) ]rt,μrt,σrt),(η,μ,σ)]=JSD(Prt||P)。
Step 38, standardizing the speed distribution difference of each space-time subregion to a normalized value a of 0-1ξ:
Calculating the traffic abnormality index A of each space-time subregionξ=aξ×10。