Disclosure of Invention
In order to solve the problems that the obtained location positioning is not accurate enough and the traffic accident location positioning in the past history data cannot be comprehensively processed based on the traffic accident location positioning method in the prior art, the application provides the traffic accident location positioning method based on natural language processing, which is suitable for accident location positioning including various conditions in the past history data and ensures that the location can be finally positioned on a road. Meanwhile, the application also provides a traffic accident location system based on natural language processing.
The technical scheme of the invention is as follows: a traffic accident location method based on natural language processing comprises the following steps:
s1: acquiring information data of a traffic accident site to be processed based on natural language expression, and recording the information data as: data to be processed;
The method is characterized by further comprising the following steps:
S2: manufacturing a stop word list for analyzing traffic accident location information and a regular expression for word segmentation;
S3: analyzing the data to be processed based on the disabling word list and the regular expression to obtain corresponding accident positioning information;
The accident positioning information includes: cities, roads, streets and places where accidents occur;
S4: based on the road retrieval technology, acquiring digital road information data corresponding to the accident occurrence road and based on longitude and latitude description, and obtaining a longitude and latitude data set describing the accident occurrence road, wherein the longitude and latitude data set is recorded as: an initial road longitude and latitude data set;
s5: judging whether the accident occurrence road is an information sparse road or not based on the data in the initial road longitude and latitude data set;
if yes, executing step S6;
Otherwise, the initial road longitude and latitude data set is recorded as: executing a step S7 by using the longitude and latitude data set R of the road to be processed;
r= [ R1, R2, … …, rn ], wherein n is the number of latitude and longitude points included in the data set, and the value is a positive integer;
S6: performing expansion point density operation on the initial road longitude and latitude data set, expanding the number of data in the data set to obtain a digital road set, and marking the digital road set as: a longitude and latitude data set R of a road to be processed;
S7: using a geocoding technology, resolving accident location text information in the accident positioning information into longitude and latitude digital information locations, and marking the longitude and latitude digital information locations as: an original accident site P;
if the original accident site P can be found, step S8 is performed;
Otherwise, the P point cannot be found, the original data is judged to be wrong, error information is returned, and the calculation is finished;
S8: carrying out road projection calculation on the original accident site P and the longitude and latitude data set R of the road to be processed, and judging a final positioning point corresponding to the data to be processed;
The road projection calculation specifically comprises the following steps:
a1: calculating the spatial distance between the P point and all points in the longitude and latitude data set R of the road to be processed, and finding 3 nearest points Ri, rj and Rk from the P point in the R;
a2: judging the position relation of points Ri, rj and Rk, and if Ri, rj and Rk are on the same straight line, executing the step a3;
Otherwise, the three points are not collinear, and the step a4 is executed;
a3: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
a4: points Ri, rj and Rk are taken as endpoints to draw a triangle, and the triangle is marked as: positioning an area;
Judging whether the point P is in the positioning area or not;
if yes, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
a5: the point closest to P is found in R, denoted: a closest point Rm;
The distance between Rm and P is denoted as: dm;
a6: comparing Dm with a preset positioning threshold;
If Dm is smaller than a positioning threshold value, judging an Rm point as a final positioning point corresponding to the data to be processed;
otherwise, judging that a final positioning point corresponding to the data to be processed cannot be found.
It is further characterized by:
Based on longitude and latitude, the method for calculating the spatial distance between two points comprises the following steps:
Setting: the points involved in the calculation are x and y, lon x represents the longitude information of the x point, lat x represents the latitude information of the x point, lon y represents the longitude information of the y point, and lat y represents the latitude information of the y point; then, the calculation process of the distance dist from the point x to the point y is as follows:
latx= latx× π/180;
lonx= lonx× π/180;
laty= laty× π/180;
lony= lony× π/180;
△lat = laty–latx;
△lon = lony–lonx;
a = sin(△lat / 2)^2 + cos(laty) * cos(latx) * sin(△lon / 2)^2;
c = 2 * arcsin(sqrt(a));
dist= 6371 * c *1000;
in step S5, the method for judging the information sparse road includes:
b1: calculating the spatial distance between all adjacent points in the longitude and latitude data set of the road to be processed;
b2: calculating the average value of all the distances to obtain the average distance between adjacent points;
b3: comparing the average distance of the adjacent points with a preset distance threshold;
In step S6, the expanding point density operation includes the following steps:
c1: confirming all adjacent points in the initial road longitude and latitude data set;
c2: calculating the interpolation quantity of points to be inserted, and then using mean filling to interpolate among all adjacent points to obtain: an intermediate dataset Rt;
inset_num=ceil (mean_dist/distance threshold value)
Wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points; the distance threshold is an ideal distance between preset adjacent points;
c3: the intermediate data set Rt is noted as: a longitude and latitude data set R of a road to be processed;
In step S7, the method for locating the original accident site P includes the following steps:
d1: determining an online geocoding service interface using an online map, transcoding the information of the city, the road and the accident site obtained by analysis, and writing the information into the online geocoding service interface;
d2: accessing the online geocoding service interface to acquire returned json data;
d3: analyzing the json data, and taking out longitude and latitude coordinate points in the json data to obtain the original accident site P;
P= [ lon P,latP ], wherein lon represents longitude coordinates and lat represents latitude coordinates;
in step a6, when it is determined that the final positioning point corresponding to the data to be processed cannot be found, the positioning areas with the points Ri, rj and Rk as the endpoints are fed back to the user as the accident occurrence place reference areas.
A natural language processing-based traffic accident location positioning system, comprising: the system comprises an accident site analysis module, a road retrieval module, a road encryption module, a geocoding module and a projection positioning module;
After the data to be processed is sent to the accident location analysis module, analyzing and obtaining accident positioning information corresponding to the data to be processed by the accident location analysis module based on a stop word list and a regular expression for word segmentation; the accident positioning information includes: cities, roads, streets and places where accidents occur; and sending the accident positioning information to the road retrieval module and the geocoding module;
The road retrieval module is used for carrying out road information retrieval on the accident occurrence road based on the offline map interface, acquiring corresponding digital road information data based on longitude and latitude description, and acquiring an initial road longitude and latitude data set; the initial road longitude and latitude data set is sent to the road encryption module;
Judging whether the accident occurrence road is an information sparse road or not in the road encryption module; if yes, performing expansion point density operation on the sparse initial road longitude and latitude data set, and expanding the too sparse digital road array into a denser array to obtain a road longitude and latitude data set R, R= [ R1, R2, … …, rn ] to be processed; otherwise, if the longitude and latitude points of the accident occurrence road are enough, the initial road longitude and latitude data set is directly recorded as: a longitude and latitude data set R of a road to be processed; the longitude and latitude data set R of the road to be processed is sent to the projection positioning module;
In the geocoding module, the accident positioning information transmitted by the accident location analysis module is converted into corresponding longitude and latitude coordinate points by using a geocoding interface and is recorded as: an original accident site P; the original accident site P is sent to the projection positioning module;
And the projection positioning module performs road projection calculation based on the received original accident site P and the longitude and latitude data set R of the road to be processed, and judges a final positioning point corresponding to the data to be processed.
It is further characterized by:
the deactivation vocabulary includes: place adverbs, intersections, roads, road sections and places;
After the accident location analysis module extracts the accident location information, the accident location information is subjected to standardized processing, and the data form of the accident location information is unified as follows: XX is the XX road XX of XX street in XX region XX;
The geocoding module further comprises reliability analysis operation, and when the geocoding interface returns the longitude and latitude coordinate points corresponding to the accident positioning information, the returned data are recorded as: data to be judged; the geocoding module performs reliability analysis on the data to be judged, calculates confidence of the data to be judged, and if the confidence is greater than 60, considers the coding result to be reliable, and sets the longitude and latitude as the original accident site P; if not, refining accident site text information, and re-sending the accident site text information to the geocoding interface until the confidence coefficient of the data to be judged is more than 60;
The operation of refining the accident site text information comprises the following steps: acquiring the text of the accident positioning information of the last time of the geocoding interface, adding preset high-frequency address words one by one in the corresponding fields of streets, roads and places, and sending the words to the geocoding interface for repositioning after adding one high-frequency address word each time;
the high frequency address term includes: bus stops, communities, hu-he, dao, lane, department, fork, street, and road.
The traffic accident location positioning method based on natural language processing fully considers the problem that the spatial location positioning is inaccurate directly based on traffic accident information input by natural language, and obtains standardized accident location text information by combining a stop word list and a text analysis method of regular expressions for word segmentation to obtain accident location information; acquiring a digital road information data set R corresponding to an accident road and based on longitude and latitude description by using a road retrieval technology, and resolving accident location text information in accident positioning information into longitude and latitude digital information locations by using a geographic coding technology to obtain an original accident location P; and then, using a road projection calculation method to project the point P on the digital road information data set R, and even if the accident place input by the case processor is not on the accident road represented by the digital road information data set R, the mapping point of the point P can be found on the road represented by the digital road information data set R, so that the final positioning point corresponding to the data to be processed can be accurately found on the accident road. The method is suitable for accident site location including various conditions in past history data, and ensures that the site can be finally located on an accident road.
Detailed Description
In the embodiment shown in the schematic view of the accident site shown in fig. 6, the circular lake path is built around the middle lake water, the lake has a center island, the center island has a center building a and a center building B, and the circular lake path has branches 1-3. If a traffic accident occurs on the loop, at a location between branch 1 and branch 3, accident handling personnel may sometimes use, in actual practice, or in historical data, such as: the ring lake road is positioned opposite to the lake core building A, and the accident site is positioned in this way. However, the actual address of the lake core building a is not on the loop lake road. Therefore, if the "circular lake road" and the "lake core building a" are directly extracted for retrieval, an accurate accident site cannot be obtained. The method can be used for positioning the accident occurrence place to the actual accident occurrence place on the ring lake based on the input of accident handling personnel.
In order to accurately analyze accident addresses based on natural language expression, the application provides a traffic accident location positioning method based on natural language processing, which comprises the following steps as shown in fig. 1.
S1: acquiring information data of a traffic accident site to be processed based on natural language expression, and recording the information data as: data to be processed.
In this embodiment, examples of the data to be processed based on the natural language expression are as follows:
Case conditions: when a vehicle with the number of the Sub XXXXXXX is driven by the vehicle with the number of the Sub XXXXX in 2023X month X day X minute and the vehicle is driven to the opposite side of the lake center building A of the circular lake road by the area of the Xc lake in the tin-free city, traffic accidents occur with the electric bicycle driven by the king X, and personnel injury and vehicle damage are caused.
Accident site: the lake shore area is opposite to the lake center building A around the lake road.
S2: manufacturing a stop word list for analyzing traffic accident location information and a regular expression for word segmentation;
the word segmentation content of the regular expression for word segmentation comprises: case, accident address and basic facts.
The Stop Words refer to that in information retrieval, in order to save storage space and improve search efficiency, certain Words or Words are automatically filtered before or after natural language data (or text) are processed, and the Words or Words are manually input, namely Stop Words, and the Stop Words are specifically manufactured and summarized based on historical data of traffic accidents, so that the Stop Words with pertinence can be obtained. Such as: the stop word list comprises: place adverbs, intersections, roads, road segments and places.
The place adverbs include: the adverbs at, located, open, to, etc. In practical application, traffic accident information describing common place adverbs can be obtained by statistics based on historical data.
The data to be processed in the application is information based on natural language description. The regular expression is a text matching mode capable of searching a specific character string, is an important technical component in natural language processing, such as a technical means commonly used in the fields of voice recognition, translation software and the like, and can be realized by using the prior art without expanding the text matching mode.
S3: based on the stop word list and the regular expression for word segmentation, analyzing the data to be processed by using a word segmentation algorithm in the prior art to obtain corresponding accident positioning information;
the accident positioning information includes: cities, roads, streets and places where accidents occur.
S4: based on the road retrieval technology, acquiring digital road information data corresponding to the accident occurrence road and based on longitude and latitude description, and obtaining a longitude and latitude data set describing the accident occurrence road, wherein the longitude and latitude data set is recorded as: an initial road longitude and latitude data set.
In the present application, offline map road retrieval techniques are used, such as: and searching the description information of the accident occurrence road by using a road searching interface in an API provided by the hundred-degree offline map. The city name and the road name are used as input, digital information of the road where the accident place is located is retrieved, and the information comprises a plurality of longitude and latitude points. These latitude and longitude points constitute an initial road latitude and longitude dataset. However, the number of longitude and latitude points returned by different roads or different search interfaces is different, the number of longitude and latitude points of some roads is quite sufficient, but the number of longitude and latitude points in the corresponding digital information of some roads, such as some remote rural roads, is quite sparse. In order to accurately locate the accident place, the road information must be ensured to be rich enough, so in the method, whether the road is an information sparse road needs to be judged based on the data number of the initial road longitude and latitude data set.
S5: judging whether the accident occurrence road is an information sparse road or not based on data in the longitude and latitude data set of the initial road;
if yes, executing step S6;
Otherwise, the initial road longitude and latitude data set is recorded as: executing a step S7 by using the longitude and latitude data set R of the road to be processed;
R= [ R1, R2, … …, rn ], wherein n is the number of latitude and longitude points included in the data set, and the value is a positive integer.
In step S5, the method for judging the information sparse road is as follows:
b1: calculating the spatial distance between all adjacent points in a longitude and latitude data set of a road to be processed based on the longitude and latitude;
The space distance is the actual distance between two longitude and latitude points, whether the road is needed to be encrypted or not is judged by calculating the average distance between all adjacent points, and the distance between every two adjacent points can be respectively judged, but the calculated amount is large.
The specific calculation method is as follows:
latj= latj× π/180;
lonj= lonj× π/180;
lati= lati× π/180;
loni= loni× π/180;
△lat = lati– latj;
△lon = loni– lonj;
a = sin(△lat / 2)^2 + cos(lati) * cos(latj) * sin(△lon / 2)^2;
c = 2 * arcsin(sqrt(a));
distij= 6371 * c *1000;
Wherein lat represents latitude coordinates, lon represents longitude coordinates, dist ij represents the distance between adjacent points i and j;
b2: calculating the average value of all the distances to obtain the average distance between adjacent points;
b3: comparing the average distance between adjacent points with a preset distance threshold;
S6: performing expansion point density operation on the sparse initial road longitude and latitude data set, expanding the number of data in the data set to obtain a digital road set, and marking as: and a longitude and latitude data set R of the road to be processed.
In step S6, the expansion point density operation includes the following steps:
c1: confirming all adjacent points in the initial road longitude and latitude data set;
c2: firstly calculating the interpolation quantity of points to be inserted, then interpolating among all adjacent points by using a mean filling mode, wherein longitude and latitude points in the original R and the inserted points form an intermediate data set Rt;
inset_num=ceil (mean_dist/distance threshold value)
Wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points; the distance threshold is a preset ideal spacing between adjacent points.
In this embodiment, the distance threshold is set to 20m, that is, when the distance between adjacent longitude and latitude points in the longitude and latitude data set R of the road to be processed is less than or equal to 20m, the point density of the longitude and latitude points required for calculation can be satisfied, and accurate positioning to the accident site can be ensured;
The mean filling method in the method comprises the following steps: after calculating the value of inset_num with two adjacent points as the start point and the end point, the inset_num points are added uniformly between the start point and the end point.
After addition, the distance between two adjacent points between the start point and the end point is the same. Such as: the distance between the adjacent points A and B is 60m, and the mean_dist=40 of the road section is as follows:
inset_num = ceil(40/20)=2;
Then, starting with a, ending with B, 2 points are added evenly in the middle: c1 and c2;
The AB segment becomes: equidistant Ac1, c1c2, c2B three segments, each segment having a length of 20m.
C3: the intermediate data set Rt is noted as: a longitude and latitude data set R of a road to be processed;
r= [ R1, R2, …, rn ], where ri= [ lat i,loni ] (i=1, 2,3, …, n), i.e. each point contains latitude and longitude information.
S7: using a geocoding technology, resolving accident location text information in accident positioning information into longitude and latitude digital information locations, and marking the longitude and latitude digital information locations as: the original accident site P. If the P point cannot be found, the original data is shown to be wrong, error information is returned, and the calculation is finished.
In step S7, the positioning method of the original accident site P includes the following steps:
d1: determining a geocoding interface using an online map, such as a geocoding interface provided by a hundred-degree offline map API, transcoding the analyzed city, road and accident site information, writing the transcoded city, road and accident site information into an online geocoding service interface, and analyzing a structured address (province/city/district/street/house number) into corresponding position coordinates;
d2: accessing an online geocoding service interface to acquire returned json data;
d3: analyzing json data, and taking out longitude and latitude coordinate points in the json data to obtain an original accident site P;
P= [ lon P,latP ], where lon represents longitude coordinates and lat represents latitude coordinates.
S8: and carrying out road projection calculation on the original accident site P and the longitude and latitude data set R of the road to be processed, and judging a final positioning point corresponding to the data to be processed.
The road projection calculation specifically comprises the following steps:
a1: based on longitude and latitude, calculating the spatial distance between the P point and all points in the longitude and latitude data set R of the road to be processed, wherein the specific calculation method is as follows:
latP= latP× π/180;
lonP= lonP× π/180;
lati= lati× π/180;
loni= loni× π/180;
△lat = lati– latP;
△lon = lonP– lonP;
a = sin(△lat / 2)^2 + cos(lati) * cos(latP) * sin(△lon / 2)^2;
c = 2 * arcsin(sqrt(a));
disti= 6371 * c *1000;
Wherein lon P and lat P respectively represent longitude and latitude information of a P point, lon i and lat i respectively represent longitude and latitude information of an i-th point in the set R, dist i represents the distance from the P point to a point Ri, and finally a distance set D= [ dist1, dist2, dist3, …, distn ] is obtained;
find the nearest 3 points Ri, rj, rk from P in R, namely: finding longitude and latitude points corresponding to three minimum values in the set D respectively, wherein Ri, rj and Rk E R;
a2: judging the position relation of points Ri, rj and Rk, and if Ri, rj and Rk are on the same straight line, executing the step a3;
Otherwise, the three points are not collinear, and the step a4 is executed;
a3: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
If P and Ri, rj and Rk are on the same straight line, P and three points are necessarily on the same road, so the point P is the final locating point of the accident;
a4: points Ri, rj and Rk are taken as endpoints to draw a triangle, and the triangle is marked as: positioning an area;
judging whether the point P is in the positioning area or not;
If yes, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
The P point is in a triangular area surrounded by Ri, rj and Rk, and the P point and the three points are necessarily on the same road, so the P point is the final positioning point of the accident;
a5: the point closest to P is found in R, denoted: a closest point Rm;
The distance between Rm and P is denoted as: dm;
The point P is not in a triangular area surrounded by Ri, rj and Rk and is not in the same straight line with the three points, and the point P and the three points are not on the same road, so that the accident handling personnel can possibly use nearby buildings or markers for positioning when inputting the accident site.
A6: comparing Dm with a preset positioning threshold;
if Dm is smaller than the positioning threshold value, judging the Rm point as a final positioning point corresponding to the data to be processed;
Otherwise, judging that a final positioning point corresponding to the data to be processed cannot be found.
The positioning threshold is a preset distance value, the unit is meter, and usually the positioning threshold is set as the average width of the road and the buildings beside the road, and the specific value of the positioning threshold can be different according to different urban areas. Or estimating an empirical value based on historical data. In this embodiment, the positioning threshold takes a value of 10m.
Finding a point Rm nearest to a point P (accident site input by accident handling personnel) on a road where an accident place is located, comparing the distance Dm between Rm and P with a positioning threshold value, and if Dm is smaller than the positioning threshold value, indicating that the point Rm can be a real place regarded as accident occurrence on the road;
Otherwise, if Dm is greater than or equal to the positioning threshold, it indicates that the distance between Rm and the point P is too far, that is, the distance between the point P and the road input by the accident handling personnel is too far, and the accident on the road input by the accident handling personnel cannot be positioned by the point P, which indicates that the final positioning point corresponding to the data to be processed cannot be found.
In the embodiment shown in fig. 6, point P is a lake core building a, and it is assumed that, through the calculation in step a1, three points closest to the lake core building a are found on the set R corresponding to the circular lake path: reference point 1, accident site and reference point 2.
The reference point 1, the accident site and the reference point 2 are not on the same straight line, and the lake-center building A is not in a triangle area surrounded by three points, so that obviously, accident handling personnel select a point outside the circular lake road as an accident occurrence point positioning reference.
The possible situations are as follows:
case 1: assuming that the accident site is closest to the lake-center building A and the distance between the accident site and the lake-center building A is smaller than 10m, the accident site can be accurately positioned;
Case 2: if the accident site is closest to the lake center building A and the distance between the accident site and the lake center building A is more than or equal to 10m, judging that the point P is too far from a road input by accident handling personnel, and finding a final positioning point corresponding to the data to be processed;
Case 3: assuming that the reference point 1 is closest to the lake core building A and the distance between the reference point 1 and the lake core building A is smaller than 10m, positioning the reference point 1 as a final positioning point corresponding to the data to be processed. At this time, although the reference point 1 and the accident site are not one point, since the average distance between the longitude and latitude points on the ring lake after the expansion point density operation is smaller than 20m and both points are relatively close to the lake center building a, even if the reference point 1 is judged as the final positioning point corresponding to the data to be processed, in practical application, such distance errors are acceptable if they are statistical data.
In specific applications, if a more accurate calculation result is required, the accuracy of the final calculation result can be controlled by adjusting the values of the positioning threshold and the distance threshold. In the method, if the P point cannot be found and the original data is indicated to be wrong, an accurate prompt can be given in the step S7; based on the fact that the point P exists truly, the result obtained based on the method is that a final positioning point corresponding to the data to be processed cannot be found, and the situation that the reference point is too far from the actual accident occurrence point when the accident handling personnel inputs the data can be judged, and a triangular area surrounded by Ri, rj and Rk is used as a possible accident occurrence area to be fed back to a user. The method can locate the accident site to the real site on the accident road, even if an abnormal result occurs, the accident site occurrence range can be reduced, and compared with other locating methods in the existing calculation, the accuracy of accident site location in accident information based on natural language description is effectively improved.
A natural language processing-based traffic accident location system, as shown in fig. 2, comprising: the system comprises an accident site analysis module 10, a road retrieval module 20, a road encryption module 30, a geocoding module 40 and a projection positioning module 50.
After the data to be processed is sent to the accident site analysis module 10, analyzing and obtaining accident positioning information corresponding to the data to be processed by the accident site analysis module 10 based on a stop word list and a regular expression for word segmentation; the accident positioning information includes: cities, roads, streets and places where accidents occur; and the accident positioning information is fed into the road retrieval module 20 and the geocoding module 40.
After the accident location analysis module 10 extracts the accident location information, it performs table standardization processing on the text of the accident location information, and unifies the data format of the accident location information into standardized location information: XX city XX region (county or county level city) XX street XX line XX.
The road retrieval module 20 uses a retrieval interface to retrieve road information of an accident road based on an offline map interface and takes city and road names as inputs, acquires corresponding digital road information data based on longitude and latitude description, and obtains an initial road longitude and latitude data set; and feeds the initial road longitude and latitude data set into the road encryption module 30.
In the road encryption module 30, it is judged whether or not the accident occurrence road is an information sparse road; if yes, performing expansion point density operation on the sparse initial road longitude and latitude data set, and expanding the too sparse digital road array into a denser array to obtain a to-be-processed road longitude and latitude data set R, R= [ R1, R2, … …, rn ]; otherwise, if the longitude and latitude points of the accident occurrence road are enough, the initial road longitude and latitude data set is directly recorded as: a longitude and latitude data set R of a road to be processed; and feeds the road longitude and latitude data set R to be processed into the projection positioning module 50.
In the geocoding module 40, the accident positioning information in the form of standardized location information inputted from the accident location analysis module 10 is converted into corresponding latitude and longitude coordinate points using the geocoding interface, and is recorded as: an original accident site P; and feeds the original incident P into the projection location module 50.
The geocoding module 40 further includes a reliability analysis operation, where when the geocoding interface returns the longitude and latitude coordinate points corresponding to the accident positioning information, the returned data is recorded as: data to be judged; the geocoding module 40 performs reliability analysis on the data to be judged, calculates confidence of the data to be judged, and if the confidence is greater than 60, considers the coding result to be reliable, and sets the longitude and latitude as an original accident site P; if not, refining the accident site text information, and re-sending the accident site text information to the geocoding interface until the confidence coefficient of the data to be judged is more than 60.
In practical application, the reliability analysis can be realized based on the reliability analysis technology in the prior art, and the confidence corresponding to the coordinate point can be provided at the same time as the longitude and latitude coordinate point returned by the geocoding interface provided by the Web service API, and the confidence can be directly used under the condition of low calculation accuracy requirement.
The operation of refining the accident site text information comprises the following steps: acquiring the text of accident positioning information of the last geocode interface, adding preset high-frequency address words one by one in corresponding fields of streets, roads and places, and inputting the geocode interface for repositioning after adding one high-frequency address word each time;
the high frequency address term includes: bus stops, communities, hu-he, dao, lane, department, fork, street, and road.
The operation of refining the accident site text information is specifically implemented as follows: the standardized location information sent to the geocoding module 40 by the accident location analysis module 10 is: street D in area B of A and street D in area C; the confidence corresponding to the data to be judged in the longitude and latitude coordinate point format converted in the geocoding module 40 is 40, and the standardized location information is refined in the geocoding module 40 to be changed into: the bus station platform of the street D in the area B of the A city and the bus station platform of the street D in the area C of the B city again carries out coordinate point conversion through a geocoding interface, and reliability analysis is carried out on the obtained judging data of the zone; until the confidence is greater than or equal to 60, the accuracy of subsequent calculation is ensured.
The projection positioning module performs road projection calculation based on the longitude and latitude data set R of the road to be processed, wherein the longitude and latitude data set R of the road to be processed meets the calculation requirement, calculates the position relationship between the P point and the road represented by the longitude and latitude data set R of the road to be processed, and further judges the final positioning point corresponding to the data to be processed.
In the method, the traffic accident site is required to be accurately positioned on the road, and the accurate road data is required to be acquired, and at present, the related electronic map supply unit provides a road retrieval scheme of an offline map, but the problem of sparse road density exists.
The number of the data after encryption processing is fixed, so the method is based on a temporary storage container for data processing by a stack, is easy to realize and has simple calculation process.
1.1: Importing accident data;
1.2: all accident data are stacked;
1.3: taking out the accident data of the stack top;
1.4: the method comprises the steps of manufacturing a stop word list, wherein the stop word list comprises commonly-occurring roads, places, openings and the like in traffic accident information besides commonly-used hundred-degree stop words, writing regular expressions for word segmentation, and carrying out Chinese word segmentation on contents such as brief case, accident places, basic facts and the like in the accident information;
1.5: extracting information of the accident city, county, street, road and accident site in the text information according to the word segmentation result;
1.6: inputting the acquired city and road information into a road retrieval interface of an offline electronic map, and acquiring a road longitude and latitude data set R of an accident, so as to complete a road retrieval function;
1.7: calculating the space distance between adjacent points in the longitude and latitude data set R of the road based on a space point distance algorithm;
1.8: calculating an average dist of the distances between all adjacent points, namely: the average distance between adjacent points is 1.9 if the distance value is larger than the distance threshold value by 20m, and is 1.10 if the distance value is not larger than the distance threshold value;
1.9: firstly calculating the number inset_num of longitude and latitude coordinate points to be inserted, then interpolating road data, and filling an interpolation method by using a mean value;
The calculation method of the interpolation number between adjacent points is as follows:
inset_num = ceil(mean_dist/20)
wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points;
after interpolation of the road data, new encrypted road data points r= [ R1, R2, …, rn ] are obtained, where ri= [ lat i,loni ] (i=1, 2,3, …, n).
1.10: Judging whether the data in the accident data stack is empty or not, if not, repeatedly executing S1.3-S1.8;
If yes, the algorithm is ended, all roads where the accident places are located are obtained, and the roads meet the conditions.
In the application, the geographic coding module 40 can convert standardized place text information into longitude and latitude coordinate points based on the accident place, as shown in fig. 4, the specific implementation process is as follows;
2.1: importing a standardized traffic accident site data set and stacking;
2.2: taking out the position data of the stack top;
2.3: inputting data into a geocode interface address and initiating a network request;
2.4: detecting whether the returned result is abnormal or not, if so, setting the final result as a null value, and if not, carrying out the next step;
2.5: calculating confidence of the returned result;
2.6: if the confidence coefficient is larger than 60, the coding result is considered to be reliable, the longitude and latitude are set as the result, if not, the accident site text information is thinned, and the process is carried out again for 2.3-2.6;
2.7: piling the encoded data into a result stack;
2.8: the accident site data is popped;
2.9: judging whether the accident site data stack is empty or not, if not, re-executing 2.2-2.9, and if so, ending.
In the accident location projection positioning module 50, a location mapping method based on distance projection is adopted, so that the longitude and latitude of the traffic accident location, the result of which is not on the road, can be projected onto the road, and the final longitude and latitude coordinates are obtained, and the specific algorithm is described as follows:
3.1: stacking longitude and latitude points calculated by the accident site geocoding module 40;
3.2: taking out a stack top data point P;
3.3: based on the standardized accident site data obtained by analysis of the traffic accident site analysis module 10, acquiring a road where the accident site is located, and taking out a road data coordinate set R processed by the road retrieval module 20 and the road encryption module 30;
3.4: the distances between the P point and all the points in the R set are calculated by the following method:
3.5: according to D, 3 points Ri, rj and Rk closest to P in R are taken out;
3.6: if Ri, rj, rk are executing 3.7 on the same straight line;
otherwise, execute 3.8;
3.7: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;
otherwise, execute 3.9;
3.8: judging whether the P is in a triangle area surrounded by three points of Ri, rj and Rk, if so, the P point is a final positioning point, and directly jumping to 3.9. If not, going to the next step 3.9;
3.9: according to D, the point Rm closest to P in R is taken out, and the distance dist between P and Rm is obtained from D;
3.10: if dist is less than 10m, rm is a final positioning point, and if not, the P point cannot acquire an accurate positioning point;
3.11: stacking the data into a new positioning stack, and popping the P point;
3.12: judging whether the original positioning site stack is empty or not, if not, re-executing the steps 3.2-3.11, and if so, ending the algorithm.
After the technical scheme of the application is used, the traffic accident site positioning is taken as a visual angle, the problems of difficult acquisition and accurate site positioning of traffic accident roads are considered, the longitude and latitude data set of the roads is acquired based on the electronic map on the basis of analyzing the accident text information, and the accurate positioning method of the accident site of projection mapping is provided, so that the positioning point can be ensured to fall on the road where the accident happens. Whether traffic accident information is processed in real time or in historical data, the traffic accident information can be processed based on the method. The text analysis method provided by the application can also provide references for natural language processing in other machine learning.