CN116992858B

CN116992858B - A method and system for locating traffic accident locations based on natural language processing

Info

Publication number: CN116992858B
Application number: CN202310961238.XA
Authority: CN
Inventors: 黄钢; 高岩; 许卉莹; 李平凡; 瞿伟斌; 邓毅萍; 张爱红
Original assignee: Traffic Management Research Institute of Ministry of Public Security
Current assignee: Traffic Management Research Institute of Ministry of Public Security
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2024-08-30
Anticipated expiration: 2043-08-02
Also published as: CN116992858A

Abstract

The present application provides a method for locating a traffic accident site based on natural language processing, which fully considers the problem of inaccurate spatial location positioning based on traffic accident information inputted in natural language, and obtains standardized accident site text information by combining a text parsing method using a stop word list and regular expressions for word segmentation, thereby obtaining accident location information; using road retrieval technology, obtaining a digital road information dataset R based on longitude and latitude description corresponding to the road where the accident occurred, and using geocoding technology, solving the accident site text information in the accident location information into a digital information location of longitude and latitude, thereby obtaining the original accident site P; and then using a road projection calculation method, projecting point P onto the digital road information dataset R, and accurately finding the final location point corresponding to the data to be processed on the road where the accident occurred.

Description

Traffic accident location method and system based on natural language processing

Technical Field

The invention relates to the technical field of intelligent traffic control, in particular to a traffic accident location method and system based on natural language processing.

Background

Traffic accident location has been a major problem that plagues the lifting of traffic management means. At present, a plurality of internet platforms provide map open interfaces with rich interfaces, and positioning information can be obtained according to actual demands, but when the map open interfaces are applied to accident site positioning, the map open interfaces are difficult to position on roads under most conditions. Related technicians mostly use a geocoding method based on the existing platform, so that accident location information can be accurately obtained, and a more refined data source is provided for traffic management departments.

The Chinese patent document CN 112052908A provides a traffic accident site clustering method, firstly, accident site data recorded in a traffic accident information acquisition table are converted into longitude and latitude coordinate points; secondly, converting the longitude and latitude coordinate points in the original coordinate system into longitude and latitude coordinate points in the target coordinate system; then calculating the distance between the longitude and latitude coordinate points according to the longitude and latitude coordinate points in the converted target coordinate system; accidents with the same spatial distribution characteristics are clustered according to the spatial distribution characteristics of the accident sites. The method is focused on processing the positioned place positioning data after positioning, and the precision problem of positioning points is not fully considered.

The Chinese patent document CN 108320515B provides a road network automatic matching and checking method for traffic accident places, which comprises the steps of obtaining traffic accident attribute information; road name matching is carried out according to national highway naming standards, and a mapping relation table of traffic road codes, national highway numbers and names is obtained; acquiring accident longitude and latitude coordinates matched with the pile number; checking the spatial relationship between the longitude and latitude coordinates of the accident and the administrative division to obtain a first accident longitude and latitude coordinate point position passing the checking; checking whether the longitude and latitude coordinate point of the first accident is consistent with the accident site description text, and obtaining a second accident longitude and latitude coordinate point passing the checking; a traffic accident analysis geographical dataset is generated. The patent focuses on road network matching on roads, can accurately solve the problem that the expressway traffic accident site with mileage stake marks is positioned, and cannot accurately position urban roads with more accidents.

The Chinese patent document CN 103631776B provides an automatic recording and positioning method of semantic expression information of traffic accident places, which comprises the step of calling accident place input and automatic positioning controls when an accident data integrated management system receives input accident data; the accident site inputting and automatic positioning control requests road name data and the topological structure of the road network from the road name dictionary server; the accident site input and automatic positioning control receives accident site information input by a user and realizes automatic positioning according to the user input information; judging whether the positioning is correct, if so, submitting; otherwise, the user manually drags the accident site icon to realize manual positioning, and then submits the accident site icon; the accident site input and automatic positioning control transmits the accident site input by the user and positioning information back to the accident data integrated management system. The patent provides a method for customizing accident site text information, which is only suitable for site positioning of an accident system which is input according to rules, and is not suitable for hundreds of millions of non-standardized accident site information in the current accident system.

The Chinese patent document CN107270922B provides a traffic accident space positioning method based on POI index, which is characterized by comprising the following steps: the first step: screening POI place information to extract GPS coordinates of specific traffic places; and a second step of: map matching is carried out on GPS coordinates of specific traffic places, and a road chain set within a range of 30m from the GPS coordinates is obtained; and a third step of: acquiring road grades according to the map file, and calculating the traffic flow direction of the road; fourth step: according to the traffic location and the road link set obtained in the second step, searching the traffic flow direction and the road grade corresponding to each road link number in the road link set in the third step, and constructing a POI index table comprising four fields: traffic location, road link number, road class and traffic direction; fifth step: according to the accident broadcasting information, extracting traffic accident information; sixth step: matching the traffic location and the traffic flow direction in the POI index table constructed in the fourth step through the traffic accident information obtained in the fifth step; and finally, obtaining the final space positioning according to the road grade screening result. The method does not consider a specific road where the accident occurs, but acquires a road chain set within a range of 30m, and cannot accurately locate the accident on the road.

The above four patent documents all relate to traffic accident location. However, there is a large amount of traffic accident information entered in the traffic database. When the past data are recorded, traffic police mostly adopt natural language to describe traffic accident places, and the recorded traffic accident place information expression modes are not uniform and standard, so that the accurate positioning of the accident places in the historical data is difficult to realize automatic identification based on a uniform method by using the technical means in the prior art. When the past information data is needed to be used, statistics can be performed only manually, so that the efficiency is low and errors are easy to occur.

Disclosure of Invention

In order to solve the problems that the obtained location positioning is not accurate enough and the traffic accident location positioning in the past history data cannot be comprehensively processed based on the traffic accident location positioning method in the prior art, the application provides the traffic accident location positioning method based on natural language processing, which is suitable for accident location positioning including various conditions in the past history data and ensures that the location can be finally positioned on a road. Meanwhile, the application also provides a traffic accident location system based on natural language processing.

The technical scheme of the invention is as follows: a traffic accident location method based on natural language processing comprises the following steps:

s1: acquiring information data of a traffic accident site to be processed based on natural language expression, and recording the information data as: data to be processed;

The method is characterized by further comprising the following steps:

S2: manufacturing a stop word list for analyzing traffic accident location information and a regular expression for word segmentation;

S3: analyzing the data to be processed based on the disabling word list and the regular expression to obtain corresponding accident positioning information;

The accident positioning information includes: cities, roads, streets and places where accidents occur;

S4: based on the road retrieval technology, acquiring digital road information data corresponding to the accident occurrence road and based on longitude and latitude description, and obtaining a longitude and latitude data set describing the accident occurrence road, wherein the longitude and latitude data set is recorded as: an initial road longitude and latitude data set;

s5: judging whether the accident occurrence road is an information sparse road or not based on the data in the initial road longitude and latitude data set;

if yes, executing step S6;

Otherwise, the initial road longitude and latitude data set is recorded as: executing a step S7 by using the longitude and latitude data set R of the road to be processed;

r= [ R1, R2, … …, rn ], wherein n is the number of latitude and longitude points included in the data set, and the value is a positive integer;

S6: performing expansion point density operation on the initial road longitude and latitude data set, expanding the number of data in the data set to obtain a digital road set, and marking the digital road set as: a longitude and latitude data set R of a road to be processed;

S7: using a geocoding technology, resolving accident location text information in the accident positioning information into longitude and latitude digital information locations, and marking the longitude and latitude digital information locations as: an original accident site P;

if the original accident site P can be found, step S8 is performed;

Otherwise, the P point cannot be found, the original data is judged to be wrong, error information is returned, and the calculation is finished;

S8: carrying out road projection calculation on the original accident site P and the longitude and latitude data set R of the road to be processed, and judging a final positioning point corresponding to the data to be processed;

The road projection calculation specifically comprises the following steps:

a1: calculating the spatial distance between the P point and all points in the longitude and latitude data set R of the road to be processed, and finding 3 nearest points Ri, rj and Rk from the P point in the R;

a2: judging the position relation of points Ri, rj and Rk, and if Ri, rj and Rk are on the same straight line, executing the step a3;

Otherwise, the three points are not collinear, and the step a4 is executed;

a3: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;

Otherwise, executing the step a5;

a4: points Ri, rj and Rk are taken as endpoints to draw a triangle, and the triangle is marked as: positioning an area;

Judging whether the point P is in the positioning area or not;

if yes, judging the P point as a final positioning point corresponding to the data to be processed;

Otherwise, executing the step a5;

a5: the point closest to P is found in R, denoted: a closest point Rm;

The distance between Rm and P is denoted as: dm;

a6: comparing Dm with a preset positioning threshold;

If Dm is smaller than a positioning threshold value, judging an Rm point as a final positioning point corresponding to the data to be processed;

otherwise, judging that a final positioning point corresponding to the data to be processed cannot be found.

It is further characterized by:

Based on longitude and latitude, the method for calculating the spatial distance between two points comprises the following steps:

Setting: the points involved in the calculation are x and y, lon _x represents the longitude information of the x point, lat _x represents the latitude information of the x point, lon _y represents the longitude information of the y point, and lat _y represents the latitude information of the y point; then, the calculation process of the distance dist from the point x to the point y is as follows:

lat_x= lat_x× π/180；

lon_x= lon_x× π/180；

lat_y= lat_y× π/180；

lon_y= lon_y× π/180；

△lat = lat_y–lat_x；

△lon = lon_y–lon_x；

a = sin(△lat / 2)^2 + cos(lat_y) * cos(lat_x) * sin(△lon / 2)^2；

c = 2 * arcsin(sqrt(a))；

dist= 6371 * c *1000；

in step S5, the method for judging the information sparse road includes:

b1: calculating the spatial distance between all adjacent points in the longitude and latitude data set of the road to be processed;

b2: calculating the average value of all the distances to obtain the average distance between adjacent points;

b3: comparing the average distance of the adjacent points with a preset distance threshold;

In step S6, the expanding point density operation includes the following steps:

c1: confirming all adjacent points in the initial road longitude and latitude data set;

c2: calculating the interpolation quantity of points to be inserted, and then using mean filling to interpolate among all adjacent points to obtain: an intermediate dataset Rt;

inset_num=ceil (mean_dist/distance threshold value)

Wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points; the distance threshold is an ideal distance between preset adjacent points;

c3: the intermediate data set Rt is noted as: a longitude and latitude data set R of a road to be processed;

In step S7, the method for locating the original accident site P includes the following steps:

d1: determining an online geocoding service interface using an online map, transcoding the information of the city, the road and the accident site obtained by analysis, and writing the information into the online geocoding service interface;

d2: accessing the online geocoding service interface to acquire returned json data;

d3: analyzing the json data, and taking out longitude and latitude coordinate points in the json data to obtain the original accident site P;

P= [ lon _P,lat_P ], wherein lon represents longitude coordinates and lat represents latitude coordinates;

in step a6, when it is determined that the final positioning point corresponding to the data to be processed cannot be found, the positioning areas with the points Ri, rj and Rk as the endpoints are fed back to the user as the accident occurrence place reference areas.

A natural language processing-based traffic accident location positioning system, comprising: the system comprises an accident site analysis module, a road retrieval module, a road encryption module, a geocoding module and a projection positioning module;

After the data to be processed is sent to the accident location analysis module, analyzing and obtaining accident positioning information corresponding to the data to be processed by the accident location analysis module based on a stop word list and a regular expression for word segmentation; the accident positioning information includes: cities, roads, streets and places where accidents occur; and sending the accident positioning information to the road retrieval module and the geocoding module;

The road retrieval module is used for carrying out road information retrieval on the accident occurrence road based on the offline map interface, acquiring corresponding digital road information data based on longitude and latitude description, and acquiring an initial road longitude and latitude data set; the initial road longitude and latitude data set is sent to the road encryption module;

Judging whether the accident occurrence road is an information sparse road or not in the road encryption module; if yes, performing expansion point density operation on the sparse initial road longitude and latitude data set, and expanding the too sparse digital road array into a denser array to obtain a road longitude and latitude data set R, R= [ R1, R2, … …, rn ] to be processed; otherwise, if the longitude and latitude points of the accident occurrence road are enough, the initial road longitude and latitude data set is directly recorded as: a longitude and latitude data set R of a road to be processed; the longitude and latitude data set R of the road to be processed is sent to the projection positioning module;

In the geocoding module, the accident positioning information transmitted by the accident location analysis module is converted into corresponding longitude and latitude coordinate points by using a geocoding interface and is recorded as: an original accident site P; the original accident site P is sent to the projection positioning module;

And the projection positioning module performs road projection calculation based on the received original accident site P and the longitude and latitude data set R of the road to be processed, and judges a final positioning point corresponding to the data to be processed.

It is further characterized by:

the deactivation vocabulary includes: place adverbs, intersections, roads, road sections and places;

After the accident location analysis module extracts the accident location information, the accident location information is subjected to standardized processing, and the data form of the accident location information is unified as follows: XX is the XX road XX of XX street in XX region XX;

The geocoding module further comprises reliability analysis operation, and when the geocoding interface returns the longitude and latitude coordinate points corresponding to the accident positioning information, the returned data are recorded as: data to be judged; the geocoding module performs reliability analysis on the data to be judged, calculates confidence of the data to be judged, and if the confidence is greater than 60, considers the coding result to be reliable, and sets the longitude and latitude as the original accident site P; if not, refining accident site text information, and re-sending the accident site text information to the geocoding interface until the confidence coefficient of the data to be judged is more than 60;

The operation of refining the accident site text information comprises the following steps: acquiring the text of the accident positioning information of the last time of the geocoding interface, adding preset high-frequency address words one by one in the corresponding fields of streets, roads and places, and sending the words to the geocoding interface for repositioning after adding one high-frequency address word each time;

the high frequency address term includes: bus stops, communities, hu-he, dao, lane, department, fork, street, and road.

The traffic accident location positioning method based on natural language processing fully considers the problem that the spatial location positioning is inaccurate directly based on traffic accident information input by natural language, and obtains standardized accident location text information by combining a stop word list and a text analysis method of regular expressions for word segmentation to obtain accident location information; acquiring a digital road information data set R corresponding to an accident road and based on longitude and latitude description by using a road retrieval technology, and resolving accident location text information in accident positioning information into longitude and latitude digital information locations by using a geographic coding technology to obtain an original accident location P; and then, using a road projection calculation method to project the point P on the digital road information data set R, and even if the accident place input by the case processor is not on the accident road represented by the digital road information data set R, the mapping point of the point P can be found on the road represented by the digital road information data set R, so that the final positioning point corresponding to the data to be processed can be accurately found on the accident road. The method is suitable for accident site location including various conditions in past history data, and ensures that the site can be finally located on an accident road.

Drawings

FIG. 1 is a flow chart of the steps of the traffic accident location positioning method provided by the invention;

Fig. 2 is a schematic structural diagram of the traffic accident location positioning system provided by the invention;

FIG. 3 is a schematic diagram of the searching and encrypting process of the present invention;

FIG. 4 is a flow chart of the geographic coding of an accident site provided by the invention;

FIG. 5 is a flow chart of projection mapping and positioning of accident sites provided by the invention;

fig. 6 is an embodiment of a schematic view of an accident site.

Detailed Description

In the embodiment shown in the schematic view of the accident site shown in fig. 6, the circular lake path is built around the middle lake water, the lake has a center island, the center island has a center building a and a center building B, and the circular lake path has branches 1-3. If a traffic accident occurs on the loop, at a location between branch 1 and branch 3, accident handling personnel may sometimes use, in actual practice, or in historical data, such as: the ring lake road is positioned opposite to the lake core building A, and the accident site is positioned in this way. However, the actual address of the lake core building a is not on the loop lake road. Therefore, if the "circular lake road" and the "lake core building a" are directly extracted for retrieval, an accurate accident site cannot be obtained. The method can be used for positioning the accident occurrence place to the actual accident occurrence place on the ring lake based on the input of accident handling personnel.

In order to accurately analyze accident addresses based on natural language expression, the application provides a traffic accident location positioning method based on natural language processing, which comprises the following steps as shown in fig. 1.

S1: acquiring information data of a traffic accident site to be processed based on natural language expression, and recording the information data as: data to be processed.

In this embodiment, examples of the data to be processed based on the natural language expression are as follows:

Case conditions: when a vehicle with the number of the Sub XXXXXXX is driven by the vehicle with the number of the Sub XXXXX in 2023X month X day X minute and the vehicle is driven to the opposite side of the lake center building A of the circular lake road by the area of the Xc lake in the tin-free city, traffic accidents occur with the electric bicycle driven by the king X, and personnel injury and vehicle damage are caused.

Accident site: the lake shore area is opposite to the lake center building A around the lake road.

the word segmentation content of the regular expression for word segmentation comprises: case, accident address and basic facts.

The Stop Words refer to that in information retrieval, in order to save storage space and improve search efficiency, certain Words or Words are automatically filtered before or after natural language data (or text) are processed, and the Words or Words are manually input, namely Stop Words, and the Stop Words are specifically manufactured and summarized based on historical data of traffic accidents, so that the Stop Words with pertinence can be obtained. Such as: the stop word list comprises: place adverbs, intersections, roads, road segments and places.

The place adverbs include: the adverbs at, located, open, to, etc. In practical application, traffic accident information describing common place adverbs can be obtained by statistics based on historical data.

The data to be processed in the application is information based on natural language description. The regular expression is a text matching mode capable of searching a specific character string, is an important technical component in natural language processing, such as a technical means commonly used in the fields of voice recognition, translation software and the like, and can be realized by using the prior art without expanding the text matching mode.

S3: based on the stop word list and the regular expression for word segmentation, analyzing the data to be processed by using a word segmentation algorithm in the prior art to obtain corresponding accident positioning information;

the accident positioning information includes: cities, roads, streets and places where accidents occur.

S4: based on the road retrieval technology, acquiring digital road information data corresponding to the accident occurrence road and based on longitude and latitude description, and obtaining a longitude and latitude data set describing the accident occurrence road, wherein the longitude and latitude data set is recorded as: an initial road longitude and latitude data set.

In the present application, offline map road retrieval techniques are used, such as: and searching the description information of the accident occurrence road by using a road searching interface in an API provided by the hundred-degree offline map. The city name and the road name are used as input, digital information of the road where the accident place is located is retrieved, and the information comprises a plurality of longitude and latitude points. These latitude and longitude points constitute an initial road latitude and longitude dataset. However, the number of longitude and latitude points returned by different roads or different search interfaces is different, the number of longitude and latitude points of some roads is quite sufficient, but the number of longitude and latitude points in the corresponding digital information of some roads, such as some remote rural roads, is quite sparse. In order to accurately locate the accident place, the road information must be ensured to be rich enough, so in the method, whether the road is an information sparse road needs to be judged based on the data number of the initial road longitude and latitude data set.

S5: judging whether the accident occurrence road is an information sparse road or not based on data in the longitude and latitude data set of the initial road;

if yes, executing step S6;

R= [ R1, R2, … …, rn ], wherein n is the number of latitude and longitude points included in the data set, and the value is a positive integer.

In step S5, the method for judging the information sparse road is as follows:

b1: calculating the spatial distance between all adjacent points in a longitude and latitude data set of a road to be processed based on the longitude and latitude;

The space distance is the actual distance between two longitude and latitude points, whether the road is needed to be encrypted or not is judged by calculating the average distance between all adjacent points, and the distance between every two adjacent points can be respectively judged, but the calculated amount is large.

The specific calculation method is as follows:

lat_j= lat_j× π/180；

lon_j= lon_j× π/180；

lat_i= lat_i× π/180；

lon_i= lon_i× π/180；

△lat = lat_i– lat_j；

△lon = lon_i– lon_j；

a = sin(△lat / 2)^2 + cos(lat_i) * cos(lat_j) * sin(△lon / 2)^2；

c = 2 * arcsin(sqrt(a))；

dist_ij= 6371 * c *1000；

Wherein lat represents latitude coordinates, lon represents longitude coordinates, dist _ij represents the distance between adjacent points i and j;

b3: comparing the average distance between adjacent points with a preset distance threshold;

S6: performing expansion point density operation on the sparse initial road longitude and latitude data set, expanding the number of data in the data set to obtain a digital road set, and marking as: and a longitude and latitude data set R of the road to be processed.

In step S6, the expansion point density operation includes the following steps:

c2: firstly calculating the interpolation quantity of points to be inserted, then interpolating among all adjacent points by using a mean filling mode, wherein longitude and latitude points in the original R and the inserted points form an intermediate data set Rt;

inset_num=ceil (mean_dist/distance threshold value)

Wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points; the distance threshold is a preset ideal spacing between adjacent points.

In this embodiment, the distance threshold is set to 20m, that is, when the distance between adjacent longitude and latitude points in the longitude and latitude data set R of the road to be processed is less than or equal to 20m, the point density of the longitude and latitude points required for calculation can be satisfied, and accurate positioning to the accident site can be ensured;

The mean filling method in the method comprises the following steps: after calculating the value of inset_num with two adjacent points as the start point and the end point, the inset_num points are added uniformly between the start point and the end point.

After addition, the distance between two adjacent points between the start point and the end point is the same. Such as: the distance between the adjacent points A and B is 60m, and the mean_dist=40 of the road section is as follows:

inset_num = ceil(40/20)=2；

Then, starting with a, ending with B, 2 points are added evenly in the middle: c1 and c2;

The AB segment becomes: equidistant Ac1, c1c2, c2B three segments, each segment having a length of 20m.

r= [ R1, R2, …, rn ], where ri= [ lat _i,lon_i ] (i=1, 2,3, …, n), i.e. each point contains latitude and longitude information.

S7: using a geocoding technology, resolving accident location text information in accident positioning information into longitude and latitude digital information locations, and marking the longitude and latitude digital information locations as: the original accident site P. If the P point cannot be found, the original data is shown to be wrong, error information is returned, and the calculation is finished.

In step S7, the positioning method of the original accident site P includes the following steps:

d1: determining a geocoding interface using an online map, such as a geocoding interface provided by a hundred-degree offline map API, transcoding the analyzed city, road and accident site information, writing the transcoded city, road and accident site information into an online geocoding service interface, and analyzing a structured address (province/city/district/street/house number) into corresponding position coordinates;

d2: accessing an online geocoding service interface to acquire returned json data;

d3: analyzing json data, and taking out longitude and latitude coordinate points in the json data to obtain an original accident site P;

P= [ lon _P,lat_P ], where lon represents longitude coordinates and lat represents latitude coordinates.

S8: and carrying out road projection calculation on the original accident site P and the longitude and latitude data set R of the road to be processed, and judging a final positioning point corresponding to the data to be processed.

The road projection calculation specifically comprises the following steps:

a1: based on longitude and latitude, calculating the spatial distance between the P point and all points in the longitude and latitude data set R of the road to be processed, wherein the specific calculation method is as follows:

lat_P= lat_P× π/180;

lon_P= lon_P× π/180;

lat_i= lat_i× π/180;

lon_i= lon_i× π/180;

△lat = lat_i– lat_P;

△lon = lon_P– lon_P;

a = sin(△lat / 2)^2 + cos(lat_i) * cos(lat_P) * sin(△lon / 2)^2;

c = 2 * arcsin(sqrt(a));

dist_i= 6371 * c *1000；

Wherein lon _P and lat _P respectively represent longitude and latitude information of a P point, lon _i and lat _i respectively represent longitude and latitude information of an i-th point in the set R, dist _i represents the distance from the P point to a point Ri, and finally a distance set D= [ dist1, dist2, dist3, …, distn ] is obtained;

find the nearest 3 points Ri, rj, rk from P in R, namely: finding longitude and latitude points corresponding to three minimum values in the set D respectively, wherein Ri, rj and Rk E R;

Otherwise, the three points are not collinear, and the step a4 is executed;

Otherwise, executing the step a5;

If P and Ri, rj and Rk are on the same straight line, P and three points are necessarily on the same road, so the point P is the final locating point of the accident;

judging whether the point P is in the positioning area or not;

Otherwise, executing the step a5;

The P point is in a triangular area surrounded by Ri, rj and Rk, and the P point and the three points are necessarily on the same road, so the P point is the final positioning point of the accident;

a5: the point closest to P is found in R, denoted: a closest point Rm;

The distance between Rm and P is denoted as: dm;

The point P is not in a triangular area surrounded by Ri, rj and Rk and is not in the same straight line with the three points, and the point P and the three points are not on the same road, so that the accident handling personnel can possibly use nearby buildings or markers for positioning when inputting the accident site.

A6: comparing Dm with a preset positioning threshold;

if Dm is smaller than the positioning threshold value, judging the Rm point as a final positioning point corresponding to the data to be processed;

The positioning threshold is a preset distance value, the unit is meter, and usually the positioning threshold is set as the average width of the road and the buildings beside the road, and the specific value of the positioning threshold can be different according to different urban areas. Or estimating an empirical value based on historical data. In this embodiment, the positioning threshold takes a value of 10m.

Finding a point Rm nearest to a point P (accident site input by accident handling personnel) on a road where an accident place is located, comparing the distance Dm between Rm and P with a positioning threshold value, and if Dm is smaller than the positioning threshold value, indicating that the point Rm can be a real place regarded as accident occurrence on the road;

Otherwise, if Dm is greater than or equal to the positioning threshold, it indicates that the distance between Rm and the point P is too far, that is, the distance between the point P and the road input by the accident handling personnel is too far, and the accident on the road input by the accident handling personnel cannot be positioned by the point P, which indicates that the final positioning point corresponding to the data to be processed cannot be found.

In the embodiment shown in fig. 6, point P is a lake core building a, and it is assumed that, through the calculation in step a1, three points closest to the lake core building a are found on the set R corresponding to the circular lake path: reference point 1, accident site and reference point 2.

The reference point 1, the accident site and the reference point 2 are not on the same straight line, and the lake-center building A is not in a triangle area surrounded by three points, so that obviously, accident handling personnel select a point outside the circular lake road as an accident occurrence point positioning reference.

The possible situations are as follows:

case 1: assuming that the accident site is closest to the lake-center building A and the distance between the accident site and the lake-center building A is smaller than 10m, the accident site can be accurately positioned;

Case 2: if the accident site is closest to the lake center building A and the distance between the accident site and the lake center building A is more than or equal to 10m, judging that the point P is too far from a road input by accident handling personnel, and finding a final positioning point corresponding to the data to be processed;

Case 3: assuming that the reference point 1 is closest to the lake core building A and the distance between the reference point 1 and the lake core building A is smaller than 10m, positioning the reference point 1 as a final positioning point corresponding to the data to be processed. At this time, although the reference point 1 and the accident site are not one point, since the average distance between the longitude and latitude points on the ring lake after the expansion point density operation is smaller than 20m and both points are relatively close to the lake center building a, even if the reference point 1 is judged as the final positioning point corresponding to the data to be processed, in practical application, such distance errors are acceptable if they are statistical data.

In specific applications, if a more accurate calculation result is required, the accuracy of the final calculation result can be controlled by adjusting the values of the positioning threshold and the distance threshold. In the method, if the P point cannot be found and the original data is indicated to be wrong, an accurate prompt can be given in the step S7; based on the fact that the point P exists truly, the result obtained based on the method is that a final positioning point corresponding to the data to be processed cannot be found, and the situation that the reference point is too far from the actual accident occurrence point when the accident handling personnel inputs the data can be judged, and a triangular area surrounded by Ri, rj and Rk is used as a possible accident occurrence area to be fed back to a user. The method can locate the accident site to the real site on the accident road, even if an abnormal result occurs, the accident site occurrence range can be reduced, and compared with other locating methods in the existing calculation, the accuracy of accident site location in accident information based on natural language description is effectively improved.

A natural language processing-based traffic accident location system, as shown in fig. 2, comprising: the system comprises an accident site analysis module 10, a road retrieval module 20, a road encryption module 30, a geocoding module 40 and a projection positioning module 50.

After the data to be processed is sent to the accident site analysis module 10, analyzing and obtaining accident positioning information corresponding to the data to be processed by the accident site analysis module 10 based on a stop word list and a regular expression for word segmentation; the accident positioning information includes: cities, roads, streets and places where accidents occur; and the accident positioning information is fed into the road retrieval module 20 and the geocoding module 40.

After the accident location analysis module 10 extracts the accident location information, it performs table standardization processing on the text of the accident location information, and unifies the data format of the accident location information into standardized location information: XX city XX region (county or county level city) XX street XX line XX.

The road retrieval module 20 uses a retrieval interface to retrieve road information of an accident road based on an offline map interface and takes city and road names as inputs, acquires corresponding digital road information data based on longitude and latitude description, and obtains an initial road longitude and latitude data set; and feeds the initial road longitude and latitude data set into the road encryption module 30.

In the road encryption module 30, it is judged whether or not the accident occurrence road is an information sparse road; if yes, performing expansion point density operation on the sparse initial road longitude and latitude data set, and expanding the too sparse digital road array into a denser array to obtain a to-be-processed road longitude and latitude data set R, R= [ R1, R2, … …, rn ]; otherwise, if the longitude and latitude points of the accident occurrence road are enough, the initial road longitude and latitude data set is directly recorded as: a longitude and latitude data set R of a road to be processed; and feeds the road longitude and latitude data set R to be processed into the projection positioning module 50.

In the geocoding module 40, the accident positioning information in the form of standardized location information inputted from the accident location analysis module 10 is converted into corresponding latitude and longitude coordinate points using the geocoding interface, and is recorded as: an original accident site P; and feeds the original incident P into the projection location module 50.

The geocoding module 40 further includes a reliability analysis operation, where when the geocoding interface returns the longitude and latitude coordinate points corresponding to the accident positioning information, the returned data is recorded as: data to be judged; the geocoding module 40 performs reliability analysis on the data to be judged, calculates confidence of the data to be judged, and if the confidence is greater than 60, considers the coding result to be reliable, and sets the longitude and latitude as an original accident site P; if not, refining the accident site text information, and re-sending the accident site text information to the geocoding interface until the confidence coefficient of the data to be judged is more than 60.

In practical application, the reliability analysis can be realized based on the reliability analysis technology in the prior art, and the confidence corresponding to the coordinate point can be provided at the same time as the longitude and latitude coordinate point returned by the geocoding interface provided by the Web service API, and the confidence can be directly used under the condition of low calculation accuracy requirement.

The operation of refining the accident site text information comprises the following steps: acquiring the text of accident positioning information of the last geocode interface, adding preset high-frequency address words one by one in corresponding fields of streets, roads and places, and inputting the geocode interface for repositioning after adding one high-frequency address word each time;

The operation of refining the accident site text information is specifically implemented as follows: the standardized location information sent to the geocoding module 40 by the accident location analysis module 10 is: street D in area B of A and street D in area C; the confidence corresponding to the data to be judged in the longitude and latitude coordinate point format converted in the geocoding module 40 is 40, and the standardized location information is refined in the geocoding module 40 to be changed into: the bus station platform of the street D in the area B of the A city and the bus station platform of the street D in the area C of the B city again carries out coordinate point conversion through a geocoding interface, and reliability analysis is carried out on the obtained judging data of the zone; until the confidence is greater than or equal to 60, the accuracy of subsequent calculation is ensured.

The projection positioning module performs road projection calculation based on the longitude and latitude data set R of the road to be processed, wherein the longitude and latitude data set R of the road to be processed meets the calculation requirement, calculates the position relationship between the P point and the road represented by the longitude and latitude data set R of the road to be processed, and further judges the final positioning point corresponding to the data to be processed.

In the method, the traffic accident site is required to be accurately positioned on the road, and the accurate road data is required to be acquired, and at present, the related electronic map supply unit provides a road retrieval scheme of an offline map, but the problem of sparse road density exists.

The number of the data after encryption processing is fixed, so the method is based on a temporary storage container for data processing by a stack, is easy to realize and has simple calculation process.

1.1: Importing accident data;

1.2: all accident data are stacked;

1.3: taking out the accident data of the stack top;

1.4: the method comprises the steps of manufacturing a stop word list, wherein the stop word list comprises commonly-occurring roads, places, openings and the like in traffic accident information besides commonly-used hundred-degree stop words, writing regular expressions for word segmentation, and carrying out Chinese word segmentation on contents such as brief case, accident places, basic facts and the like in the accident information;

1.5: extracting information of the accident city, county, street, road and accident site in the text information according to the word segmentation result;

1.6: inputting the acquired city and road information into a road retrieval interface of an offline electronic map, and acquiring a road longitude and latitude data set R of an accident, so as to complete a road retrieval function;

1.7: calculating the space distance between adjacent points in the longitude and latitude data set R of the road based on a space point distance algorithm;

1.8: calculating an average dist of the distances between all adjacent points, namely: the average distance between adjacent points is 1.9 if the distance value is larger than the distance threshold value by 20m, and is 1.10 if the distance value is not larger than the distance threshold value;

1.9: firstly calculating the number inset_num of longitude and latitude coordinate points to be inserted, then interpolating road data, and filling an interpolation method by using a mean value;

The calculation method of the interpolation number between adjacent points is as follows:

inset_num = ceil(mean_dist/20)

wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points;

after interpolation of the road data, new encrypted road data points r= [ R1, R2, …, rn ] are obtained, where ri= [ lat _i,lon_i ] (i=1, 2,3, …, n).

1.10: Judging whether the data in the accident data stack is empty or not, if not, repeatedly executing S1.3-S1.8;

If yes, the algorithm is ended, all roads where the accident places are located are obtained, and the roads meet the conditions.

In the application, the geographic coding module 40 can convert standardized place text information into longitude and latitude coordinate points based on the accident place, as shown in fig. 4, the specific implementation process is as follows;

2.1: importing a standardized traffic accident site data set and stacking;

2.2: taking out the position data of the stack top;

2.3: inputting data into a geocode interface address and initiating a network request;

2.4: detecting whether the returned result is abnormal or not, if so, setting the final result as a null value, and if not, carrying out the next step;

2.5: calculating confidence of the returned result;

2.6: if the confidence coefficient is larger than 60, the coding result is considered to be reliable, the longitude and latitude are set as the result, if not, the accident site text information is thinned, and the process is carried out again for 2.3-2.6;

2.7: piling the encoded data into a result stack;

2.8: the accident site data is popped;

2.9: judging whether the accident site data stack is empty or not, if not, re-executing 2.2-2.9, and if so, ending.

In the accident location projection positioning module 50, a location mapping method based on distance projection is adopted, so that the longitude and latitude of the traffic accident location, the result of which is not on the road, can be projected onto the road, and the final longitude and latitude coordinates are obtained, and the specific algorithm is described as follows:

3.1: stacking longitude and latitude points calculated by the accident site geocoding module 40;

3.2: taking out a stack top data point P;

3.3: based on the standardized accident site data obtained by analysis of the traffic accident site analysis module 10, acquiring a road where the accident site is located, and taking out a road data coordinate set R processed by the road retrieval module 20 and the road encryption module 30;

3.4: the distances between the P point and all the points in the R set are calculated by the following method:

3.5: according to D, 3 points Ri, rj and Rk closest to P in R are taken out;

3.6: if Ri, rj, rk are executing 3.7 on the same straight line;

otherwise, execute 3.8;

3.7: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;

otherwise, execute 3.9;

3.8: judging whether the P is in a triangle area surrounded by three points of Ri, rj and Rk, if so, the P point is a final positioning point, and directly jumping to 3.9. If not, going to the next step 3.9;

3.9: according to D, the point Rm closest to P in R is taken out, and the distance dist between P and Rm is obtained from D;

3.10: if dist is less than 10m, rm is a final positioning point, and if not, the P point cannot acquire an accurate positioning point;

3.11: stacking the data into a new positioning stack, and popping the P point;

3.12: judging whether the original positioning site stack is empty or not, if not, re-executing the steps 3.2-3.11, and if so, ending the algorithm.

After the technical scheme of the application is used, the traffic accident site positioning is taken as a visual angle, the problems of difficult acquisition and accurate site positioning of traffic accident roads are considered, the longitude and latitude data set of the roads is acquired based on the electronic map on the basis of analyzing the accident text information, and the accurate positioning method of the accident site of projection mapping is provided, so that the positioning point can be ensured to fall on the road where the accident happens. Whether traffic accident information is processed in real time or in historical data, the traffic accident information can be processed based on the method. The text analysis method provided by the application can also provide references for natural language processing in other machine learning.

Claims

1. A method for locating a traffic accident location based on natural language processing, comprising the following steps:

S1: Obtain information data of the location of the traffic accident to be processed based on natural language expression, recorded as: data to be processed;

It is characterized in that it also includes the following steps:

S2: Create a stop word list for traffic accident location information analysis and a regular expression for word segmentation;

S3: Analyze the data to be processed based on the stop word list and the regular expression to obtain corresponding accident location information;

The accident location information includes: the city, road, street and location where the accident occurred;

S4: Based on the road retrieval technology, the digital road information data based on the longitude and latitude description corresponding to the road where the accident occurred is obtained, and the longitude and latitude data set describing the road where the accident occurred is obtained, which is recorded as: the initial road longitude and latitude data set;

S5: Based on the data in the initial road longitude and latitude data set, determine whether the road where the accident occurred is an information-sparse road;

If yes, execute step S6;

Otherwise, the initial road longitude and latitude data set is recorded as the road longitude and latitude data set R to be processed, and step S7 is executed;

R=[R1, R2, ..., Rn], where n is the number of longitude and latitude points included in the data set, and is a positive integer;

S6: performing an expansion point density operation on the initial road longitude and latitude data set to expand the number of data in the data set to obtain a digital road set, which is recorded as: a road longitude and latitude data set R to be processed;

S7: using geocoding technology, the text information of the accident location in the accident location information is converted into latitude and longitude digital information location, recorded as: original accident location P;

If the original accident location P can be found, execute step S8;

Otherwise, point P cannot be found, the original data is judged to be incorrect, an error message is returned, and the calculation ends;

S8: Performing road projection calculation on the original accident location P and the road latitude and longitude data set R to be processed, and determining the final positioning point corresponding to the data to be processed;

The road projection calculation specifically includes the following steps:

a1: Calculate the spatial distance between point P and all points in the road latitude and longitude dataset R to be processed, and find the three points Ri, Rj, Rk closest to P in R;

a2: Determine the positional relationship between points Ri, Rj and Rk. If Ri, Rj and Rk are on the same straight line, execute step a3;

Otherwise, the three points are not collinear, and step a4 is executed;

a3: Determine whether point P and points Ri, Rj and Rk are on the same straight line. If so, determine that point P is the final positioning point corresponding to the data to be processed;

Otherwise, execute step a5;

a4: Draw a triangle with points Ri, Rj and Rk as endpoints, denoted as: positioning area;

Determine whether point P is within the positioning area;

If yes, then point P is determined to be the final positioning point corresponding to the data to be processed;

Otherwise, execute step a5;

a5: Find the point in R that is closest to P, recorded as: the closest point Rm;

The distance between Rm and P is recorded as: Dm;

a6: Compare Dm with the preset positioning threshold;

If Dm<positioning threshold, then point Rm is determined to be the final positioning point corresponding to the data to be processed;

Otherwise, it is determined that the final positioning point corresponding to the data to be processed cannot be found;

In step S6, the expansion point density operation includes the following steps:

c1: confirm all adjacent points in the initial road longitude and latitude dataset;

c2: Calculate the number of interpolation points that need to be inserted, and then use mean filling to interpolate between all adjacent points to obtain: the intermediate data set Rt;

inset_num = ceil(mean_dist/distance threshold),

Among them, inset_num is the number of points to be inserted, ceil is the rounding up algorithm, mean_dist is the average distance between all adjacent points; the distance threshold is the preset ideal distance between adjacent points;

c3: record the intermediate data set Rt as the longitude and latitude data set R of the road to be processed;

In step S7, the method for locating the original accident location P comprises the following steps:

d1: Determine to use an online geocoding service interface of an online map, transcode the parsed city, road and accident location information and write them into the online geocoding service interface;

d2: Access the online geocoding service interface to obtain the returned json data;

d3: Parse the JSON data, extract the longitude and latitude coordinates therein, and obtain the original accident location P;

P=[ lon _P , lat _P ], where lon represents the longitude coordinate and lat represents the latitude coordinate.

2. According to claim 1, a method for locating a traffic accident location based on natural language processing is characterized in that: based on longitude and latitude, the method for calculating the spatial distance between two points is:

Assume that the points involved in the calculation are x and y, lon _x represents the longitude information of point x, lat _x represents the latitude information of point x, lon _y represents the longitude information of point y, and lat _y represents the latitude information of point y; then, the calculation process of the distance dist from point x to point y is:

lat _x = lat _x × π/180;

lon _x = lon _x × π/180;

lat _y = lat _y × π/180;

lon _y = lon _y × π/180;

△lat = lat _y –lat _x ;

△lon = lon _y –lon _x ;

a = sin(△lat / 2)^2 + cos(lat _y ) * cos(lat _x ) * sin(△lon / 2)^2;

c = 2 * arcsin(sqrt(a));

dist = 6371 * c * 1000.

3. The method for locating a traffic accident site based on natural language processing according to claim 1, characterized in that: in step S5, the method for determining the information-sparse road is:

b1: Calculate the spatial distances between all adjacent points in the road longitude and latitude data set to be processed;

b2: Calculate the average of all distances to get the average distance between adjacent points;

b3: Compare the average distance between adjacent points with a preset distance threshold.

4. According to claim 1, a method for locating a traffic accident site based on natural language processing is characterized in that: in step a6, when it is determined that the final positioning point corresponding to the data to be processed cannot be found, the positioning area with points Ri, Rj and Rk as endpoints is fed back to the user as a reference area of the accident site.

5. A traffic accident location positioning system based on natural language processing, used to implement the traffic accident location positioning method according to claim 1, characterized in that it comprises: an accident location analysis module, a road retrieval module, a road encryption module, a geocoding module and a projection positioning module;

After the data to be processed is sent to the accident location analysis module, the accident location analysis module analyzes the data to be processed based on the stop word list and the regular expression used for word segmentation to obtain the accident location information corresponding to the data to be processed; the accident location information includes: the city, road, street and location where the accident occurred; and the accident location information is sent to the road retrieval module and the geocoding module;

The road retrieval module retrieves road information of the road where the accident occurred based on the offline map interface, obtains corresponding digitized road information data based on longitude and latitude description, and obtains an initial road longitude and latitude data set; and sends the initial road longitude and latitude data set to the road encryption module;

In the road encryption module, it is determined whether the road where the accident occurred is an information-sparse road; if so, the sparse initial road longitude and latitude data set is subjected to a point density expansion operation, and the overly sparse digital road array is expanded into a relatively dense array to obtain a road longitude and latitude data set R to be processed, where R=[R1, R2, ..., Rn]; otherwise, if the longitude and latitude points of the road where the accident occurred are sufficient, the initial road longitude and latitude data set is directly recorded as: road longitude and latitude data set R to be processed; and the road longitude and latitude data set R to be processed is sent to the projection positioning module;

In the geocoding module, the accident location information transmitted by the accident location analysis module is converted into corresponding longitude and latitude coordinate points using the geocoding interface, which are recorded as: original accident location P; and the original accident location P is sent to the projection positioning module;

The projection positioning module performs road projection calculation based on the received original accident location P and the longitude and latitude data set R of the road to be processed, and determines the final positioning point corresponding to the data to be processed.

6. A traffic accident location positioning system based on natural language processing according to claim 5, characterized in that the stop word list includes: location adverbs, intersection, road, section and place.

7. A traffic accident location location system based on natural language processing according to claim 5, characterized in that after the accident location analysis module extracts the accident location information, it standardizes the text of the accident location information.

8. According to claim 5, a traffic accident location location system based on natural language processing is characterized in that: the geocoding module also includes a credibility analysis operation, when the geocoding interface returns the latitude and longitude coordinates corresponding to the accident location information, the returned data is recorded as: data to be judged; the geocoding module performs a credibility analysis on the data to be judged, calculates the confidence of the data to be judged, if the confidence is greater than 60, the coding result is considered reliable, and the longitude and latitude are set as the original accident location P; if not, the accident location text information is refined and re-sent to the geocoding interface until the confidence of the data to be judged is greater than 60;

The operation of refining the text information of the accident location includes: obtaining the text of the accident location information of the last geocoding interface, adding preset high-frequency address words to the fields corresponding to streets, roads and places respectively, and sending it to the geocoding interface for re-location after adding a high-frequency address word each time.