[go: up one dir, main page]

CN116992858B - A method and system for locating traffic accident locations based on natural language processing - Google Patents

A method and system for locating traffic accident locations based on natural language processing Download PDF

Info

Publication number
CN116992858B
CN116992858B CN202310961238.XA CN202310961238A CN116992858B CN 116992858 B CN116992858 B CN 116992858B CN 202310961238 A CN202310961238 A CN 202310961238A CN 116992858 B CN116992858 B CN 116992858B
Authority
CN
China
Prior art keywords
road
accident
longitude
latitude
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310961238.XA
Other languages
Chinese (zh)
Other versions
CN116992858A (en
Inventor
黄钢
高岩
许卉莹
李平凡
瞿伟斌
邓毅萍
张爱红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Traffic Management Research Institute of Ministry of Public Security
Original Assignee
Traffic Management Research Institute of Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Traffic Management Research Institute of Ministry of Public Security filed Critical Traffic Management Research Institute of Ministry of Public Security
Priority to CN202310961238.XA priority Critical patent/CN116992858B/en
Publication of CN116992858A publication Critical patent/CN116992858A/en
Application granted granted Critical
Publication of CN116992858B publication Critical patent/CN116992858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • G08G1/0129Traffic data processing for creating historical data or processing based on historical data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Fuzzy Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Traffic Control Systems (AREA)

Abstract

本申请提供的一种基于自然语言处理的交通事故地点定位方法,充分考虑到基于自然语言输入的交通事故信息直接进行空间位置定位不准确的问题,结合停用词表和分词用正则表达式的文本解析方法获取标准化事故地点文本信息,得到事故定位用信息;使用道路检索技术,获取事故发生道路对应的基于经纬度描述的数字化道路信息数据集R,以及使用地理编码技术将事故定位用信息中的事故地点文字信息解算成经纬度数字信息地点,得到原始事故地点P;然后使用道路投影计算方法,将点P在数字化道路信息数据集R上进行投影,在事故发生道路上准确地找到待处理数据对应的最终定位点。

The present application provides a method for locating a traffic accident site based on natural language processing, which fully considers the problem of inaccurate spatial location positioning based on traffic accident information inputted in natural language, and obtains standardized accident site text information by combining a text parsing method using a stop word list and regular expressions for word segmentation, thereby obtaining accident location information; using road retrieval technology, obtaining a digital road information dataset R based on longitude and latitude description corresponding to the road where the accident occurred, and using geocoding technology, solving the accident site text information in the accident location information into a digital information location of longitude and latitude, thereby obtaining the original accident site P; and then using a road projection calculation method, projecting point P onto the digital road information dataset R, and accurately finding the final location point corresponding to the data to be processed on the road where the accident occurred.

Description

Traffic accident location method and system based on natural language processing
Technical Field
The invention relates to the technical field of intelligent traffic control, in particular to a traffic accident location method and system based on natural language processing.
Background
Traffic accident location has been a major problem that plagues the lifting of traffic management means. At present, a plurality of internet platforms provide map open interfaces with rich interfaces, and positioning information can be obtained according to actual demands, but when the map open interfaces are applied to accident site positioning, the map open interfaces are difficult to position on roads under most conditions. Related technicians mostly use a geocoding method based on the existing platform, so that accident location information can be accurately obtained, and a more refined data source is provided for traffic management departments.
The Chinese patent document CN 112052908A provides a traffic accident site clustering method, firstly, accident site data recorded in a traffic accident information acquisition table are converted into longitude and latitude coordinate points; secondly, converting the longitude and latitude coordinate points in the original coordinate system into longitude and latitude coordinate points in the target coordinate system; then calculating the distance between the longitude and latitude coordinate points according to the longitude and latitude coordinate points in the converted target coordinate system; accidents with the same spatial distribution characteristics are clustered according to the spatial distribution characteristics of the accident sites. The method is focused on processing the positioned place positioning data after positioning, and the precision problem of positioning points is not fully considered.
The Chinese patent document CN 108320515B provides a road network automatic matching and checking method for traffic accident places, which comprises the steps of obtaining traffic accident attribute information; road name matching is carried out according to national highway naming standards, and a mapping relation table of traffic road codes, national highway numbers and names is obtained; acquiring accident longitude and latitude coordinates matched with the pile number; checking the spatial relationship between the longitude and latitude coordinates of the accident and the administrative division to obtain a first accident longitude and latitude coordinate point position passing the checking; checking whether the longitude and latitude coordinate point of the first accident is consistent with the accident site description text, and obtaining a second accident longitude and latitude coordinate point passing the checking; a traffic accident analysis geographical dataset is generated. The patent focuses on road network matching on roads, can accurately solve the problem that the expressway traffic accident site with mileage stake marks is positioned, and cannot accurately position urban roads with more accidents.
The Chinese patent document CN 103631776B provides an automatic recording and positioning method of semantic expression information of traffic accident places, which comprises the step of calling accident place input and automatic positioning controls when an accident data integrated management system receives input accident data; the accident site inputting and automatic positioning control requests road name data and the topological structure of the road network from the road name dictionary server; the accident site input and automatic positioning control receives accident site information input by a user and realizes automatic positioning according to the user input information; judging whether the positioning is correct, if so, submitting; otherwise, the user manually drags the accident site icon to realize manual positioning, and then submits the accident site icon; the accident site input and automatic positioning control transmits the accident site input by the user and positioning information back to the accident data integrated management system. The patent provides a method for customizing accident site text information, which is only suitable for site positioning of an accident system which is input according to rules, and is not suitable for hundreds of millions of non-standardized accident site information in the current accident system.
The Chinese patent document CN107270922B provides a traffic accident space positioning method based on POI index, which is characterized by comprising the following steps: the first step: screening POI place information to extract GPS coordinates of specific traffic places; and a second step of: map matching is carried out on GPS coordinates of specific traffic places, and a road chain set within a range of 30m from the GPS coordinates is obtained; and a third step of: acquiring road grades according to the map file, and calculating the traffic flow direction of the road; fourth step: according to the traffic location and the road link set obtained in the second step, searching the traffic flow direction and the road grade corresponding to each road link number in the road link set in the third step, and constructing a POI index table comprising four fields: traffic location, road link number, road class and traffic direction; fifth step: according to the accident broadcasting information, extracting traffic accident information; sixth step: matching the traffic location and the traffic flow direction in the POI index table constructed in the fourth step through the traffic accident information obtained in the fifth step; and finally, obtaining the final space positioning according to the road grade screening result. The method does not consider a specific road where the accident occurs, but acquires a road chain set within a range of 30m, and cannot accurately locate the accident on the road.
The above four patent documents all relate to traffic accident location. However, there is a large amount of traffic accident information entered in the traffic database. When the past data are recorded, traffic police mostly adopt natural language to describe traffic accident places, and the recorded traffic accident place information expression modes are not uniform and standard, so that the accurate positioning of the accident places in the historical data is difficult to realize automatic identification based on a uniform method by using the technical means in the prior art. When the past information data is needed to be used, statistics can be performed only manually, so that the efficiency is low and errors are easy to occur.
Disclosure of Invention
In order to solve the problems that the obtained location positioning is not accurate enough and the traffic accident location positioning in the past history data cannot be comprehensively processed based on the traffic accident location positioning method in the prior art, the application provides the traffic accident location positioning method based on natural language processing, which is suitable for accident location positioning including various conditions in the past history data and ensures that the location can be finally positioned on a road. Meanwhile, the application also provides a traffic accident location system based on natural language processing.
The technical scheme of the invention is as follows: a traffic accident location method based on natural language processing comprises the following steps:
s1: acquiring information data of a traffic accident site to be processed based on natural language expression, and recording the information data as: data to be processed;
The method is characterized by further comprising the following steps:
S2: manufacturing a stop word list for analyzing traffic accident location information and a regular expression for word segmentation;
S3: analyzing the data to be processed based on the disabling word list and the regular expression to obtain corresponding accident positioning information;
The accident positioning information includes: cities, roads, streets and places where accidents occur;
S4: based on the road retrieval technology, acquiring digital road information data corresponding to the accident occurrence road and based on longitude and latitude description, and obtaining a longitude and latitude data set describing the accident occurrence road, wherein the longitude and latitude data set is recorded as: an initial road longitude and latitude data set;
s5: judging whether the accident occurrence road is an information sparse road or not based on the data in the initial road longitude and latitude data set;
if yes, executing step S6;
Otherwise, the initial road longitude and latitude data set is recorded as: executing a step S7 by using the longitude and latitude data set R of the road to be processed;
r= [ R1, R2, … …, rn ], wherein n is the number of latitude and longitude points included in the data set, and the value is a positive integer;
S6: performing expansion point density operation on the initial road longitude and latitude data set, expanding the number of data in the data set to obtain a digital road set, and marking the digital road set as: a longitude and latitude data set R of a road to be processed;
S7: using a geocoding technology, resolving accident location text information in the accident positioning information into longitude and latitude digital information locations, and marking the longitude and latitude digital information locations as: an original accident site P;
if the original accident site P can be found, step S8 is performed;
Otherwise, the P point cannot be found, the original data is judged to be wrong, error information is returned, and the calculation is finished;
S8: carrying out road projection calculation on the original accident site P and the longitude and latitude data set R of the road to be processed, and judging a final positioning point corresponding to the data to be processed;
The road projection calculation specifically comprises the following steps:
a1: calculating the spatial distance between the P point and all points in the longitude and latitude data set R of the road to be processed, and finding 3 nearest points Ri, rj and Rk from the P point in the R;
a2: judging the position relation of points Ri, rj and Rk, and if Ri, rj and Rk are on the same straight line, executing the step a3;
Otherwise, the three points are not collinear, and the step a4 is executed;
a3: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
a4: points Ri, rj and Rk are taken as endpoints to draw a triangle, and the triangle is marked as: positioning an area;
Judging whether the point P is in the positioning area or not;
if yes, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
a5: the point closest to P is found in R, denoted: a closest point Rm;
The distance between Rm and P is denoted as: dm;
a6: comparing Dm with a preset positioning threshold;
If Dm is smaller than a positioning threshold value, judging an Rm point as a final positioning point corresponding to the data to be processed;
otherwise, judging that a final positioning point corresponding to the data to be processed cannot be found.
It is further characterized by:
Based on longitude and latitude, the method for calculating the spatial distance between two points comprises the following steps:
Setting: the points involved in the calculation are x and y, lon x represents the longitude information of the x point, lat x represents the latitude information of the x point, lon y represents the longitude information of the y point, and lat y represents the latitude information of the y point; then, the calculation process of the distance dist from the point x to the point y is as follows:
latx= latx× π/180;
lonx= lonx× π/180;
laty= laty× π/180;
lony= lony× π/180;
△lat = laty–latx
△lon = lony–lonx
a = sin(△lat / 2)^2 + cos(laty) * cos(latx) * sin(△lon / 2)^2;
c = 2 * arcsin(sqrt(a));
dist= 6371 * c *1000;
in step S5, the method for judging the information sparse road includes:
b1: calculating the spatial distance between all adjacent points in the longitude and latitude data set of the road to be processed;
b2: calculating the average value of all the distances to obtain the average distance between adjacent points;
b3: comparing the average distance of the adjacent points with a preset distance threshold;
In step S6, the expanding point density operation includes the following steps:
c1: confirming all adjacent points in the initial road longitude and latitude data set;
c2: calculating the interpolation quantity of points to be inserted, and then using mean filling to interpolate among all adjacent points to obtain: an intermediate dataset Rt;
inset_num=ceil (mean_dist/distance threshold value)
Wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points; the distance threshold is an ideal distance between preset adjacent points;
c3: the intermediate data set Rt is noted as: a longitude and latitude data set R of a road to be processed;
In step S7, the method for locating the original accident site P includes the following steps:
d1: determining an online geocoding service interface using an online map, transcoding the information of the city, the road and the accident site obtained by analysis, and writing the information into the online geocoding service interface;
d2: accessing the online geocoding service interface to acquire returned json data;
d3: analyzing the json data, and taking out longitude and latitude coordinate points in the json data to obtain the original accident site P;
P= [ lon P,latP ], wherein lon represents longitude coordinates and lat represents latitude coordinates;
in step a6, when it is determined that the final positioning point corresponding to the data to be processed cannot be found, the positioning areas with the points Ri, rj and Rk as the endpoints are fed back to the user as the accident occurrence place reference areas.
A natural language processing-based traffic accident location positioning system, comprising: the system comprises an accident site analysis module, a road retrieval module, a road encryption module, a geocoding module and a projection positioning module;
After the data to be processed is sent to the accident location analysis module, analyzing and obtaining accident positioning information corresponding to the data to be processed by the accident location analysis module based on a stop word list and a regular expression for word segmentation; the accident positioning information includes: cities, roads, streets and places where accidents occur; and sending the accident positioning information to the road retrieval module and the geocoding module;
The road retrieval module is used for carrying out road information retrieval on the accident occurrence road based on the offline map interface, acquiring corresponding digital road information data based on longitude and latitude description, and acquiring an initial road longitude and latitude data set; the initial road longitude and latitude data set is sent to the road encryption module;
Judging whether the accident occurrence road is an information sparse road or not in the road encryption module; if yes, performing expansion point density operation on the sparse initial road longitude and latitude data set, and expanding the too sparse digital road array into a denser array to obtain a road longitude and latitude data set R, R= [ R1, R2, … …, rn ] to be processed; otherwise, if the longitude and latitude points of the accident occurrence road are enough, the initial road longitude and latitude data set is directly recorded as: a longitude and latitude data set R of a road to be processed; the longitude and latitude data set R of the road to be processed is sent to the projection positioning module;
In the geocoding module, the accident positioning information transmitted by the accident location analysis module is converted into corresponding longitude and latitude coordinate points by using a geocoding interface and is recorded as: an original accident site P; the original accident site P is sent to the projection positioning module;
And the projection positioning module performs road projection calculation based on the received original accident site P and the longitude and latitude data set R of the road to be processed, and judges a final positioning point corresponding to the data to be processed.
It is further characterized by:
the deactivation vocabulary includes: place adverbs, intersections, roads, road sections and places;
After the accident location analysis module extracts the accident location information, the accident location information is subjected to standardized processing, and the data form of the accident location information is unified as follows: XX is the XX road XX of XX street in XX region XX;
The geocoding module further comprises reliability analysis operation, and when the geocoding interface returns the longitude and latitude coordinate points corresponding to the accident positioning information, the returned data are recorded as: data to be judged; the geocoding module performs reliability analysis on the data to be judged, calculates confidence of the data to be judged, and if the confidence is greater than 60, considers the coding result to be reliable, and sets the longitude and latitude as the original accident site P; if not, refining accident site text information, and re-sending the accident site text information to the geocoding interface until the confidence coefficient of the data to be judged is more than 60;
The operation of refining the accident site text information comprises the following steps: acquiring the text of the accident positioning information of the last time of the geocoding interface, adding preset high-frequency address words one by one in the corresponding fields of streets, roads and places, and sending the words to the geocoding interface for repositioning after adding one high-frequency address word each time;
the high frequency address term includes: bus stops, communities, hu-he, dao, lane, department, fork, street, and road.
The traffic accident location positioning method based on natural language processing fully considers the problem that the spatial location positioning is inaccurate directly based on traffic accident information input by natural language, and obtains standardized accident location text information by combining a stop word list and a text analysis method of regular expressions for word segmentation to obtain accident location information; acquiring a digital road information data set R corresponding to an accident road and based on longitude and latitude description by using a road retrieval technology, and resolving accident location text information in accident positioning information into longitude and latitude digital information locations by using a geographic coding technology to obtain an original accident location P; and then, using a road projection calculation method to project the point P on the digital road information data set R, and even if the accident place input by the case processor is not on the accident road represented by the digital road information data set R, the mapping point of the point P can be found on the road represented by the digital road information data set R, so that the final positioning point corresponding to the data to be processed can be accurately found on the accident road. The method is suitable for accident site location including various conditions in past history data, and ensures that the site can be finally located on an accident road.
Drawings
FIG. 1 is a flow chart of the steps of the traffic accident location positioning method provided by the invention;
Fig. 2 is a schematic structural diagram of the traffic accident location positioning system provided by the invention;
FIG. 3 is a schematic diagram of the searching and encrypting process of the present invention;
FIG. 4 is a flow chart of the geographic coding of an accident site provided by the invention;
FIG. 5 is a flow chart of projection mapping and positioning of accident sites provided by the invention;
fig. 6 is an embodiment of a schematic view of an accident site.
Detailed Description
In the embodiment shown in the schematic view of the accident site shown in fig. 6, the circular lake path is built around the middle lake water, the lake has a center island, the center island has a center building a and a center building B, and the circular lake path has branches 1-3. If a traffic accident occurs on the loop, at a location between branch 1 and branch 3, accident handling personnel may sometimes use, in actual practice, or in historical data, such as: the ring lake road is positioned opposite to the lake core building A, and the accident site is positioned in this way. However, the actual address of the lake core building a is not on the loop lake road. Therefore, if the "circular lake road" and the "lake core building a" are directly extracted for retrieval, an accurate accident site cannot be obtained. The method can be used for positioning the accident occurrence place to the actual accident occurrence place on the ring lake based on the input of accident handling personnel.
In order to accurately analyze accident addresses based on natural language expression, the application provides a traffic accident location positioning method based on natural language processing, which comprises the following steps as shown in fig. 1.
S1: acquiring information data of a traffic accident site to be processed based on natural language expression, and recording the information data as: data to be processed.
In this embodiment, examples of the data to be processed based on the natural language expression are as follows:
Case conditions: when a vehicle with the number of the Sub XXXXXXX is driven by the vehicle with the number of the Sub XXXXX in 2023X month X day X minute and the vehicle is driven to the opposite side of the lake center building A of the circular lake road by the area of the Xc lake in the tin-free city, traffic accidents occur with the electric bicycle driven by the king X, and personnel injury and vehicle damage are caused.
Accident site: the lake shore area is opposite to the lake center building A around the lake road.
S2: manufacturing a stop word list for analyzing traffic accident location information and a regular expression for word segmentation;
the word segmentation content of the regular expression for word segmentation comprises: case, accident address and basic facts.
The Stop Words refer to that in information retrieval, in order to save storage space and improve search efficiency, certain Words or Words are automatically filtered before or after natural language data (or text) are processed, and the Words or Words are manually input, namely Stop Words, and the Stop Words are specifically manufactured and summarized based on historical data of traffic accidents, so that the Stop Words with pertinence can be obtained. Such as: the stop word list comprises: place adverbs, intersections, roads, road segments and places.
The place adverbs include: the adverbs at, located, open, to, etc. In practical application, traffic accident information describing common place adverbs can be obtained by statistics based on historical data.
The data to be processed in the application is information based on natural language description. The regular expression is a text matching mode capable of searching a specific character string, is an important technical component in natural language processing, such as a technical means commonly used in the fields of voice recognition, translation software and the like, and can be realized by using the prior art without expanding the text matching mode.
S3: based on the stop word list and the regular expression for word segmentation, analyzing the data to be processed by using a word segmentation algorithm in the prior art to obtain corresponding accident positioning information;
the accident positioning information includes: cities, roads, streets and places where accidents occur.
S4: based on the road retrieval technology, acquiring digital road information data corresponding to the accident occurrence road and based on longitude and latitude description, and obtaining a longitude and latitude data set describing the accident occurrence road, wherein the longitude and latitude data set is recorded as: an initial road longitude and latitude data set.
In the present application, offline map road retrieval techniques are used, such as: and searching the description information of the accident occurrence road by using a road searching interface in an API provided by the hundred-degree offline map. The city name and the road name are used as input, digital information of the road where the accident place is located is retrieved, and the information comprises a plurality of longitude and latitude points. These latitude and longitude points constitute an initial road latitude and longitude dataset. However, the number of longitude and latitude points returned by different roads or different search interfaces is different, the number of longitude and latitude points of some roads is quite sufficient, but the number of longitude and latitude points in the corresponding digital information of some roads, such as some remote rural roads, is quite sparse. In order to accurately locate the accident place, the road information must be ensured to be rich enough, so in the method, whether the road is an information sparse road needs to be judged based on the data number of the initial road longitude and latitude data set.
S5: judging whether the accident occurrence road is an information sparse road or not based on data in the longitude and latitude data set of the initial road;
if yes, executing step S6;
Otherwise, the initial road longitude and latitude data set is recorded as: executing a step S7 by using the longitude and latitude data set R of the road to be processed;
R= [ R1, R2, … …, rn ], wherein n is the number of latitude and longitude points included in the data set, and the value is a positive integer.
In step S5, the method for judging the information sparse road is as follows:
b1: calculating the spatial distance between all adjacent points in a longitude and latitude data set of a road to be processed based on the longitude and latitude;
The space distance is the actual distance between two longitude and latitude points, whether the road is needed to be encrypted or not is judged by calculating the average distance between all adjacent points, and the distance between every two adjacent points can be respectively judged, but the calculated amount is large.
The specific calculation method is as follows:
latj= latj× π/180;
lonj= lonj× π/180;
lati= lati× π/180;
loni= loni× π/180;
△lat = lati– latj
△lon = loni– lonj
a = sin(△lat / 2)^2 + cos(lati) * cos(latj) * sin(△lon / 2)^2;
c = 2 * arcsin(sqrt(a));
distij= 6371 * c *1000;
Wherein lat represents latitude coordinates, lon represents longitude coordinates, dist ij represents the distance between adjacent points i and j;
b2: calculating the average value of all the distances to obtain the average distance between adjacent points;
b3: comparing the average distance between adjacent points with a preset distance threshold;
S6: performing expansion point density operation on the sparse initial road longitude and latitude data set, expanding the number of data in the data set to obtain a digital road set, and marking as: and a longitude and latitude data set R of the road to be processed.
In step S6, the expansion point density operation includes the following steps:
c1: confirming all adjacent points in the initial road longitude and latitude data set;
c2: firstly calculating the interpolation quantity of points to be inserted, then interpolating among all adjacent points by using a mean filling mode, wherein longitude and latitude points in the original R and the inserted points form an intermediate data set Rt;
inset_num=ceil (mean_dist/distance threshold value)
Wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points; the distance threshold is a preset ideal spacing between adjacent points.
In this embodiment, the distance threshold is set to 20m, that is, when the distance between adjacent longitude and latitude points in the longitude and latitude data set R of the road to be processed is less than or equal to 20m, the point density of the longitude and latitude points required for calculation can be satisfied, and accurate positioning to the accident site can be ensured;
The mean filling method in the method comprises the following steps: after calculating the value of inset_num with two adjacent points as the start point and the end point, the inset_num points are added uniformly between the start point and the end point.
After addition, the distance between two adjacent points between the start point and the end point is the same. Such as: the distance between the adjacent points A and B is 60m, and the mean_dist=40 of the road section is as follows:
inset_num = ceil(40/20)=2;
Then, starting with a, ending with B, 2 points are added evenly in the middle: c1 and c2;
The AB segment becomes: equidistant Ac1, c1c2, c2B three segments, each segment having a length of 20m.
C3: the intermediate data set Rt is noted as: a longitude and latitude data set R of a road to be processed;
r= [ R1, R2, …, rn ], where ri= [ lat i,loni ] (i=1, 2,3, …, n), i.e. each point contains latitude and longitude information.
S7: using a geocoding technology, resolving accident location text information in accident positioning information into longitude and latitude digital information locations, and marking the longitude and latitude digital information locations as: the original accident site P. If the P point cannot be found, the original data is shown to be wrong, error information is returned, and the calculation is finished.
In step S7, the positioning method of the original accident site P includes the following steps:
d1: determining a geocoding interface using an online map, such as a geocoding interface provided by a hundred-degree offline map API, transcoding the analyzed city, road and accident site information, writing the transcoded city, road and accident site information into an online geocoding service interface, and analyzing a structured address (province/city/district/street/house number) into corresponding position coordinates;
d2: accessing an online geocoding service interface to acquire returned json data;
d3: analyzing json data, and taking out longitude and latitude coordinate points in the json data to obtain an original accident site P;
P= [ lon P,latP ], where lon represents longitude coordinates and lat represents latitude coordinates.
S8: and carrying out road projection calculation on the original accident site P and the longitude and latitude data set R of the road to be processed, and judging a final positioning point corresponding to the data to be processed.
The road projection calculation specifically comprises the following steps:
a1: based on longitude and latitude, calculating the spatial distance between the P point and all points in the longitude and latitude data set R of the road to be processed, wherein the specific calculation method is as follows:
latP= latP× π/180;
lonP= lonP× π/180;
lati= lati× π/180;
loni= loni× π/180;
△lat = lati– latP;
△lon = lonP– lonP;
a = sin(△lat / 2)^2 + cos(lati) * cos(latP) * sin(△lon / 2)^2;
c = 2 * arcsin(sqrt(a));
disti= 6371 * c *1000;
Wherein lon P and lat P respectively represent longitude and latitude information of a P point, lon i and lat i respectively represent longitude and latitude information of an i-th point in the set R, dist i represents the distance from the P point to a point Ri, and finally a distance set D= [ dist1, dist2, dist3, …, distn ] is obtained;
find the nearest 3 points Ri, rj, rk from P in R, namely: finding longitude and latitude points corresponding to three minimum values in the set D respectively, wherein Ri, rj and Rk E R;
a2: judging the position relation of points Ri, rj and Rk, and if Ri, rj and Rk are on the same straight line, executing the step a3;
Otherwise, the three points are not collinear, and the step a4 is executed;
a3: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
If P and Ri, rj and Rk are on the same straight line, P and three points are necessarily on the same road, so the point P is the final locating point of the accident;
a4: points Ri, rj and Rk are taken as endpoints to draw a triangle, and the triangle is marked as: positioning an area;
judging whether the point P is in the positioning area or not;
If yes, judging the P point as a final positioning point corresponding to the data to be processed;
Otherwise, executing the step a5;
The P point is in a triangular area surrounded by Ri, rj and Rk, and the P point and the three points are necessarily on the same road, so the P point is the final positioning point of the accident;
a5: the point closest to P is found in R, denoted: a closest point Rm;
The distance between Rm and P is denoted as: dm;
The point P is not in a triangular area surrounded by Ri, rj and Rk and is not in the same straight line with the three points, and the point P and the three points are not on the same road, so that the accident handling personnel can possibly use nearby buildings or markers for positioning when inputting the accident site.
A6: comparing Dm with a preset positioning threshold;
if Dm is smaller than the positioning threshold value, judging the Rm point as a final positioning point corresponding to the data to be processed;
Otherwise, judging that a final positioning point corresponding to the data to be processed cannot be found.
The positioning threshold is a preset distance value, the unit is meter, and usually the positioning threshold is set as the average width of the road and the buildings beside the road, and the specific value of the positioning threshold can be different according to different urban areas. Or estimating an empirical value based on historical data. In this embodiment, the positioning threshold takes a value of 10m.
Finding a point Rm nearest to a point P (accident site input by accident handling personnel) on a road where an accident place is located, comparing the distance Dm between Rm and P with a positioning threshold value, and if Dm is smaller than the positioning threshold value, indicating that the point Rm can be a real place regarded as accident occurrence on the road;
Otherwise, if Dm is greater than or equal to the positioning threshold, it indicates that the distance between Rm and the point P is too far, that is, the distance between the point P and the road input by the accident handling personnel is too far, and the accident on the road input by the accident handling personnel cannot be positioned by the point P, which indicates that the final positioning point corresponding to the data to be processed cannot be found.
In the embodiment shown in fig. 6, point P is a lake core building a, and it is assumed that, through the calculation in step a1, three points closest to the lake core building a are found on the set R corresponding to the circular lake path: reference point 1, accident site and reference point 2.
The reference point 1, the accident site and the reference point 2 are not on the same straight line, and the lake-center building A is not in a triangle area surrounded by three points, so that obviously, accident handling personnel select a point outside the circular lake road as an accident occurrence point positioning reference.
The possible situations are as follows:
case 1: assuming that the accident site is closest to the lake-center building A and the distance between the accident site and the lake-center building A is smaller than 10m, the accident site can be accurately positioned;
Case 2: if the accident site is closest to the lake center building A and the distance between the accident site and the lake center building A is more than or equal to 10m, judging that the point P is too far from a road input by accident handling personnel, and finding a final positioning point corresponding to the data to be processed;
Case 3: assuming that the reference point 1 is closest to the lake core building A and the distance between the reference point 1 and the lake core building A is smaller than 10m, positioning the reference point 1 as a final positioning point corresponding to the data to be processed. At this time, although the reference point 1 and the accident site are not one point, since the average distance between the longitude and latitude points on the ring lake after the expansion point density operation is smaller than 20m and both points are relatively close to the lake center building a, even if the reference point 1 is judged as the final positioning point corresponding to the data to be processed, in practical application, such distance errors are acceptable if they are statistical data.
In specific applications, if a more accurate calculation result is required, the accuracy of the final calculation result can be controlled by adjusting the values of the positioning threshold and the distance threshold. In the method, if the P point cannot be found and the original data is indicated to be wrong, an accurate prompt can be given in the step S7; based on the fact that the point P exists truly, the result obtained based on the method is that a final positioning point corresponding to the data to be processed cannot be found, and the situation that the reference point is too far from the actual accident occurrence point when the accident handling personnel inputs the data can be judged, and a triangular area surrounded by Ri, rj and Rk is used as a possible accident occurrence area to be fed back to a user. The method can locate the accident site to the real site on the accident road, even if an abnormal result occurs, the accident site occurrence range can be reduced, and compared with other locating methods in the existing calculation, the accuracy of accident site location in accident information based on natural language description is effectively improved.
A natural language processing-based traffic accident location system, as shown in fig. 2, comprising: the system comprises an accident site analysis module 10, a road retrieval module 20, a road encryption module 30, a geocoding module 40 and a projection positioning module 50.
After the data to be processed is sent to the accident site analysis module 10, analyzing and obtaining accident positioning information corresponding to the data to be processed by the accident site analysis module 10 based on a stop word list and a regular expression for word segmentation; the accident positioning information includes: cities, roads, streets and places where accidents occur; and the accident positioning information is fed into the road retrieval module 20 and the geocoding module 40.
After the accident location analysis module 10 extracts the accident location information, it performs table standardization processing on the text of the accident location information, and unifies the data format of the accident location information into standardized location information: XX city XX region (county or county level city) XX street XX line XX.
The road retrieval module 20 uses a retrieval interface to retrieve road information of an accident road based on an offline map interface and takes city and road names as inputs, acquires corresponding digital road information data based on longitude and latitude description, and obtains an initial road longitude and latitude data set; and feeds the initial road longitude and latitude data set into the road encryption module 30.
In the road encryption module 30, it is judged whether or not the accident occurrence road is an information sparse road; if yes, performing expansion point density operation on the sparse initial road longitude and latitude data set, and expanding the too sparse digital road array into a denser array to obtain a to-be-processed road longitude and latitude data set R, R= [ R1, R2, … …, rn ]; otherwise, if the longitude and latitude points of the accident occurrence road are enough, the initial road longitude and latitude data set is directly recorded as: a longitude and latitude data set R of a road to be processed; and feeds the road longitude and latitude data set R to be processed into the projection positioning module 50.
In the geocoding module 40, the accident positioning information in the form of standardized location information inputted from the accident location analysis module 10 is converted into corresponding latitude and longitude coordinate points using the geocoding interface, and is recorded as: an original accident site P; and feeds the original incident P into the projection location module 50.
The geocoding module 40 further includes a reliability analysis operation, where when the geocoding interface returns the longitude and latitude coordinate points corresponding to the accident positioning information, the returned data is recorded as: data to be judged; the geocoding module 40 performs reliability analysis on the data to be judged, calculates confidence of the data to be judged, and if the confidence is greater than 60, considers the coding result to be reliable, and sets the longitude and latitude as an original accident site P; if not, refining the accident site text information, and re-sending the accident site text information to the geocoding interface until the confidence coefficient of the data to be judged is more than 60.
In practical application, the reliability analysis can be realized based on the reliability analysis technology in the prior art, and the confidence corresponding to the coordinate point can be provided at the same time as the longitude and latitude coordinate point returned by the geocoding interface provided by the Web service API, and the confidence can be directly used under the condition of low calculation accuracy requirement.
The operation of refining the accident site text information comprises the following steps: acquiring the text of accident positioning information of the last geocode interface, adding preset high-frequency address words one by one in corresponding fields of streets, roads and places, and inputting the geocode interface for repositioning after adding one high-frequency address word each time;
the high frequency address term includes: bus stops, communities, hu-he, dao, lane, department, fork, street, and road.
The operation of refining the accident site text information is specifically implemented as follows: the standardized location information sent to the geocoding module 40 by the accident location analysis module 10 is: street D in area B of A and street D in area C; the confidence corresponding to the data to be judged in the longitude and latitude coordinate point format converted in the geocoding module 40 is 40, and the standardized location information is refined in the geocoding module 40 to be changed into: the bus station platform of the street D in the area B of the A city and the bus station platform of the street D in the area C of the B city again carries out coordinate point conversion through a geocoding interface, and reliability analysis is carried out on the obtained judging data of the zone; until the confidence is greater than or equal to 60, the accuracy of subsequent calculation is ensured.
The projection positioning module performs road projection calculation based on the longitude and latitude data set R of the road to be processed, wherein the longitude and latitude data set R of the road to be processed meets the calculation requirement, calculates the position relationship between the P point and the road represented by the longitude and latitude data set R of the road to be processed, and further judges the final positioning point corresponding to the data to be processed.
In the method, the traffic accident site is required to be accurately positioned on the road, and the accurate road data is required to be acquired, and at present, the related electronic map supply unit provides a road retrieval scheme of an offline map, but the problem of sparse road density exists.
The number of the data after encryption processing is fixed, so the method is based on a temporary storage container for data processing by a stack, is easy to realize and has simple calculation process.
1.1: Importing accident data;
1.2: all accident data are stacked;
1.3: taking out the accident data of the stack top;
1.4: the method comprises the steps of manufacturing a stop word list, wherein the stop word list comprises commonly-occurring roads, places, openings and the like in traffic accident information besides commonly-used hundred-degree stop words, writing regular expressions for word segmentation, and carrying out Chinese word segmentation on contents such as brief case, accident places, basic facts and the like in the accident information;
1.5: extracting information of the accident city, county, street, road and accident site in the text information according to the word segmentation result;
1.6: inputting the acquired city and road information into a road retrieval interface of an offline electronic map, and acquiring a road longitude and latitude data set R of an accident, so as to complete a road retrieval function;
1.7: calculating the space distance between adjacent points in the longitude and latitude data set R of the road based on a space point distance algorithm;
1.8: calculating an average dist of the distances between all adjacent points, namely: the average distance between adjacent points is 1.9 if the distance value is larger than the distance threshold value by 20m, and is 1.10 if the distance value is not larger than the distance threshold value;
1.9: firstly calculating the number inset_num of longitude and latitude coordinate points to be inserted, then interpolating road data, and filling an interpolation method by using a mean value;
The calculation method of the interpolation number between adjacent points is as follows:
inset_num = ceil(mean_dist/20)
wherein inset_num is the number of points to be inserted, ceil is an upward rounding algorithm, and mean_dist is the average distance between all adjacent points;
after interpolation of the road data, new encrypted road data points r= [ R1, R2, …, rn ] are obtained, where ri= [ lat i,loni ] (i=1, 2,3, …, n).
1.10: Judging whether the data in the accident data stack is empty or not, if not, repeatedly executing S1.3-S1.8;
If yes, the algorithm is ended, all roads where the accident places are located are obtained, and the roads meet the conditions.
In the application, the geographic coding module 40 can convert standardized place text information into longitude and latitude coordinate points based on the accident place, as shown in fig. 4, the specific implementation process is as follows;
2.1: importing a standardized traffic accident site data set and stacking;
2.2: taking out the position data of the stack top;
2.3: inputting data into a geocode interface address and initiating a network request;
2.4: detecting whether the returned result is abnormal or not, if so, setting the final result as a null value, and if not, carrying out the next step;
2.5: calculating confidence of the returned result;
2.6: if the confidence coefficient is larger than 60, the coding result is considered to be reliable, the longitude and latitude are set as the result, if not, the accident site text information is thinned, and the process is carried out again for 2.3-2.6;
2.7: piling the encoded data into a result stack;
2.8: the accident site data is popped;
2.9: judging whether the accident site data stack is empty or not, if not, re-executing 2.2-2.9, and if so, ending.
In the accident location projection positioning module 50, a location mapping method based on distance projection is adopted, so that the longitude and latitude of the traffic accident location, the result of which is not on the road, can be projected onto the road, and the final longitude and latitude coordinates are obtained, and the specific algorithm is described as follows:
3.1: stacking longitude and latitude points calculated by the accident site geocoding module 40;
3.2: taking out a stack top data point P;
3.3: based on the standardized accident site data obtained by analysis of the traffic accident site analysis module 10, acquiring a road where the accident site is located, and taking out a road data coordinate set R processed by the road retrieval module 20 and the road encryption module 30;
3.4: the distances between the P point and all the points in the R set are calculated by the following method:
3.5: according to D, 3 points Ri, rj and Rk closest to P in R are taken out;
3.6: if Ri, rj, rk are executing 3.7 on the same straight line;
otherwise, execute 3.8;
3.7: judging whether the P and the points Ri, rj and Rk are on the same straight line, if so, judging the P point as a final positioning point corresponding to the data to be processed;
otherwise, execute 3.9;
3.8: judging whether the P is in a triangle area surrounded by three points of Ri, rj and Rk, if so, the P point is a final positioning point, and directly jumping to 3.9. If not, going to the next step 3.9;
3.9: according to D, the point Rm closest to P in R is taken out, and the distance dist between P and Rm is obtained from D;
3.10: if dist is less than 10m, rm is a final positioning point, and if not, the P point cannot acquire an accurate positioning point;
3.11: stacking the data into a new positioning stack, and popping the P point;
3.12: judging whether the original positioning site stack is empty or not, if not, re-executing the steps 3.2-3.11, and if so, ending the algorithm.
After the technical scheme of the application is used, the traffic accident site positioning is taken as a visual angle, the problems of difficult acquisition and accurate site positioning of traffic accident roads are considered, the longitude and latitude data set of the roads is acquired based on the electronic map on the basis of analyzing the accident text information, and the accurate positioning method of the accident site of projection mapping is provided, so that the positioning point can be ensured to fall on the road where the accident happens. Whether traffic accident information is processed in real time or in historical data, the traffic accident information can be processed based on the method. The text analysis method provided by the application can also provide references for natural language processing in other machine learning.

Claims (8)

1.一种基于自然语言处理的交通事故地点定位方法,其包括以下步骤:1. A method for locating a traffic accident location based on natural language processing, comprising the following steps: S1:获取基于自然语言表达的待处理交通事故地点的信息数据,记作:待处理数据;S1: Obtain information data of the location of the traffic accident to be processed based on natural language expression, recorded as: data to be processed; 其特征在于,其还包括以下步骤:It is characterized in that it also includes the following steps: S2:制作面向交通事故地点信息分析用的停用词表,以及用于分词的正则表达式;S2: Create a stop word list for traffic accident location information analysis and a regular expression for word segmentation; S3:基于所述停用词表和正则表达式,对所述待处理数据进行分析,得到对应的事故定位用信息;S3: Analyze the data to be processed based on the stop word list and the regular expression to obtain corresponding accident location information; 所述事故定位用信息中包括:事故发生的城市、道路、街道和地点;The accident location information includes: the city, road, street and location where the accident occurred; S4:基于道路检索技术,获取事故发生道路对应的基于经纬度描述的数字化道路信息数据,得到描述事故发生道路的经纬度数据集,记作:初始道路经纬度数据集;S4: Based on the road retrieval technology, the digital road information data based on the longitude and latitude description corresponding to the road where the accident occurred is obtained, and the longitude and latitude data set describing the road where the accident occurred is obtained, which is recorded as: the initial road longitude and latitude data set; S5:基于所述初始道路经纬度数据集中的数据,判断事故发生道路是否为信息稀疏道路;S5: Based on the data in the initial road longitude and latitude data set, determine whether the road where the accident occurred is an information-sparse road; 如果是,则执行步骤S6;If yes, execute step S6; 否则,将所述初始道路经纬度数据集记作:待处理道路经纬度数据集R,执行步骤S7;Otherwise, the initial road longitude and latitude data set is recorded as the road longitude and latitude data set R to be processed, and step S7 is executed; R=[R1,R2,……,Rn],其中,n为数据集中包括的经纬度点的个数,取值为正整数;R=[R1, R2, ..., Rn], where n is the number of longitude and latitude points included in the data set, and is a positive integer; S6:将所述初始道路经纬度数据集进行扩充点密度操作,扩充数据集中的数据个数,得到数字化道路集合,并记作:待处理道路经纬度数据集R;S6: performing an expansion point density operation on the initial road longitude and latitude data set to expand the number of data in the data set to obtain a digital road set, which is recorded as: a road longitude and latitude data set R to be processed; S7:使用地理编码技术,将所述事故定位用信息中的事故地点文字信息解算成经纬度数字信息地点,记作:原始事故地点P;S7: using geocoding technology, the text information of the accident location in the accident location information is converted into latitude and longitude digital information location, recorded as: original accident location P; 如果可以找到所述原始事故地点P,则执行步骤S8;If the original accident location P can be found, execute step S8; 否则,无法找到P点,判断原始数据有误,返回错误信息,结束本次计算;Otherwise, point P cannot be found, the original data is judged to be incorrect, an error message is returned, and the calculation ends; S8:将所述原始事故地点P与所述待处理道路经纬度数据集R进行道路投影计算,判断所述待处理数据对应的最终定位点;S8: Performing road projection calculation on the original accident location P and the road latitude and longitude data set R to be processed, and determining the final positioning point corresponding to the data to be processed; 所述道路投影计算,具体包括以下步骤:The road projection calculation specifically includes the following steps: a1:计算P点与所述待处理道路经纬度数据集R集中所有点的空间距离,在R 中找到距离P最近3个点Ri、Rj、Rk;a1: Calculate the spatial distance between point P and all points in the road latitude and longitude dataset R to be processed, and find the three points Ri, Rj, Rk closest to P in R; a2:判断点Ri、Rj和Rk的位置关系,如果 Ri、Rj、Rk同在一条直线上,则执行步骤a3;a2: Determine the positional relationship between points Ri, Rj and Rk. If Ri, Rj and Rk are on the same straight line, execute step a3; 否则,三点不共线,执行步骤a4;Otherwise, the three points are not collinear, and step a4 is executed; a3:判断P与点Ri、Rj和Rk是否同在一条直线上,如果是,则判断P点为所述待处理数据对应的最终定位点;a3: Determine whether point P and points Ri, Rj and Rk are on the same straight line. If so, determine that point P is the final positioning point corresponding to the data to be processed; 否则,执行步骤a5;Otherwise, execute step a5; a4:以点Ri、Rj和Rk做端点画一个三角形,记作:定位区域;a4: Draw a triangle with points Ri, Rj and Rk as endpoints, denoted as: positioning area; 判断点P是否在所述定位区域之内;Determine whether point P is within the positioning area; 如果是,则判断P点为所述待处理数据对应的最终定位点;If yes, then point P is determined to be the final positioning point corresponding to the data to be processed; 否则执行步骤a5;Otherwise, execute step a5; a5:在R 中找到距离P最近的那个点,记作:最近点Rm;a5: Find the point in R that is closest to P, recorded as: the closest point Rm; Rm与P的距离记作:Dm;The distance between Rm and P is recorded as: Dm; a6:比较Dm与预设的定位阈值;a6: Compare Dm with the preset positioning threshold; 如果Dm<定位阈值,则判断Rm点为所述待处理数据对应的最终定位点;If Dm<positioning threshold, then point Rm is determined to be the final positioning point corresponding to the data to be processed; 否则,判断无法找到所述待处理数据对应的最终定位点;Otherwise, it is determined that the final positioning point corresponding to the data to be processed cannot be found; 步骤S6中,所述扩充点密度操作包括以下步骤:In step S6, the expansion point density operation includes the following steps: c1:确认所述初始道路经纬度数据集中的所有相邻点;c1: confirm all adjacent points in the initial road longitude and latitude dataset; c2:计算需要插入的点的插值数量,然后使用均值填充,在所有的相邻点之间进行插值,得到:中间数据集Rt;c2: Calculate the number of interpolation points that need to be inserted, and then use mean filling to interpolate between all adjacent points to obtain: the intermediate data set Rt; inset_num = ceil(mean_dist/距离阈值),inset_num = ceil(mean_dist/distance threshold), 其中,inset_num为需要插入的点的个数,ceil为向上取整算法,mean_dist为所有相邻点平均间距;距离阈值为预设的相邻点之间的理想间距;Among them, inset_num is the number of points to be inserted, ceil is the rounding up algorithm, mean_dist is the average distance between all adjacent points; the distance threshold is the preset ideal distance between adjacent points; c3:将中间数据集Rt记作:待处理道路经纬度数据集R;c3: record the intermediate data set Rt as the longitude and latitude data set R of the road to be processed; 步骤S7中,所述原始事故地点P的定位方法,包括以下步骤:In step S7, the method for locating the original accident location P comprises the following steps: d1:确定使用在线地图的在线地理编码服务接口,将解析得到的城市、道路和事故地点信息转码后写入所述在线地理编码服务接口中;d1: Determine to use an online geocoding service interface of an online map, transcode the parsed city, road and accident location information and write them into the online geocoding service interface; d2:访问所述在线地理编码服务接口,获取返回的json数据;d2: Access the online geocoding service interface to obtain the returned json data; d3:对所述json数据进行解析,取出其中的经纬度坐标点,得到所述原始事故地点P;d3: Parse the JSON data, extract the longitude and latitude coordinates therein, and obtain the original accident location P; P=[ lon P lat P ],其中,lon代表经度坐标,lat代表纬度坐标。P=[ lon P , lat P ], where lon represents the longitude coordinate and lat represents the latitude coordinate. 2.根据权利要求1所述一种基于自然语言处理的交通事故地点定位方法,其特征在于:基于经纬度,计算两个点之间的所述空间距离的方法为:2. According to claim 1, a method for locating a traffic accident location based on natural language processing is characterized in that: based on longitude and latitude, the method for calculating the spatial distance between two points is: 设:参与计算的点为x和y,lonx代表x点的经度信息,latx代表x点的纬度信息, lony代表y点的经度信息,laty代表y点的纬度信息;则,点x到点y的距离dist的计算过程为:Assume that the points involved in the calculation are x and y, lon x represents the longitude information of point x, lat x represents the latitude information of point x, lon y represents the longitude information of point y, and lat y represents the latitude information of point y; then, the calculation process of the distance dist from point x to point y is: latx = latx× π/180;lat x = lat x × π/180; lonx= lonx× π/180;lon x = lon x × π/180; laty= laty× π/180;lat y = lat y × π/180; lony= lony × π/180;lon y = lon y × π/180; △lat = laty–latx△lat = lat y –lat x ; △lon = lony –lonx△lon = lon y –lon x ; a = sin(△lat / 2)^2 + cos(laty) * cos(latx) * sin(△lon / 2)^2;a = sin(△lat / 2)^2 + cos(lat y ) * cos(lat x ) * sin(△lon / 2)^2; c = 2 * arcsin(sqrt(a));c = 2 * arcsin(sqrt(a)); dist= 6371 * c *1000。dist = 6371 * c * 1000. 3.根据权利要求1所述一种基于自然语言处理的交通事故地点定位方法,其特征在于:步骤S5中,所述信息稀疏道路的判断方法为:3. The method for locating a traffic accident site based on natural language processing according to claim 1, characterized in that: in step S5, the method for determining the information-sparse road is: b1:计算所述待处理道路经纬度数据集中,所有相邻点之间的空间距离;b1: Calculate the spatial distances between all adjacent points in the road longitude and latitude data set to be processed; b2:计算所有的距离的平均值,得到相邻点平均距离;b2: Calculate the average of all distances to get the average distance between adjacent points; b3:将所述相邻点平均距离与预设的距离阈值进行比较。b3: Compare the average distance between adjacent points with a preset distance threshold. 4.根据权利要求1所述一种基于自然语言处理的交通事故地点定位方法,其特征在于:步骤a6中,当判断无法找到所述待处理数据对应的最终定位点时,将点Ri、Rj和Rk做端点的所述定位区域,作为事故发生地点参考区域反馈给用户。4. According to claim 1, a method for locating a traffic accident site based on natural language processing is characterized in that: in step a6, when it is determined that the final positioning point corresponding to the data to be processed cannot be found, the positioning area with points Ri, Rj and Rk as endpoints is fed back to the user as a reference area of the accident site. 5.一种基于自然语言处理的交通事故地点定位系统,用于实现权利要求1所述的交通事故地点定位方法,其特征在于,其包括:事故地点解析模块、道路检索模块、道路加密模块、地理编码模块和投影定位模块;5. A traffic accident location positioning system based on natural language processing, used to implement the traffic accident location positioning method according to claim 1, characterized in that it comprises: an accident location analysis module, a road retrieval module, a road encryption module, a geocoding module and a projection positioning module; 待处理数据送入所述事故地点解析模块后,在所述事故地点解析模块基于停用词表和用于分词的正则表达式,解析得到待处理数据对应的事故定位用信息;所述事故定位用信息中包括:事故发生的城市、道路、街道和地点;并且将所述事故定位用信息送入到所述道路检索模块和所述地理编码模块中;After the data to be processed is sent to the accident location analysis module, the accident location analysis module analyzes the data to be processed based on the stop word list and the regular expression used for word segmentation to obtain the accident location information corresponding to the data to be processed; the accident location information includes: the city, road, street and location where the accident occurred; and the accident location information is sent to the road retrieval module and the geocoding module; 所述道路检索模块基于离线地图接口对事故发生道路进行道路信息检索,获取对应的基于经纬度描述的数字化道路信息数据,得到初始道路经纬度数据集;并将所述初始道路经纬度数据集送入到所述道路加密模块中;The road retrieval module retrieves road information of the road where the accident occurred based on the offline map interface, obtains corresponding digitized road information data based on longitude and latitude description, and obtains an initial road longitude and latitude data set; and sends the initial road longitude and latitude data set to the road encryption module; 在所述道路加密模块中,判断事故发生道路是否为信息稀疏道路;如果是,则将稀疏的所述初始道路经纬度数据集进行扩充点密度操作,将过于稀疏的数字道路数组扩展成较为密集的数组,得到待处理道路经纬度数据集R,R=[R1,R2,……,Rn];否则,如果事故发生道路的经纬度点足够,则直接将所述初始道路经纬度数据集记作:待处理道路经纬度数据集R;并且将所述待处理道路经纬度数据集R送入所述投影定位模块;In the road encryption module, it is determined whether the road where the accident occurred is an information-sparse road; if so, the sparse initial road longitude and latitude data set is subjected to a point density expansion operation, and the overly sparse digital road array is expanded into a relatively dense array to obtain a road longitude and latitude data set R to be processed, where R=[R1, R2, ..., Rn]; otherwise, if the longitude and latitude points of the road where the accident occurred are sufficient, the initial road longitude and latitude data set is directly recorded as: road longitude and latitude data set R to be processed; and the road longitude and latitude data set R to be processed is sent to the projection positioning module; 在所述地理编码模块中,将所述事故地点解析模块传入的所述事故定位用信息,使用地理编码接口转换为对应的经纬度坐标点,记作:原始事故地点P;并将所述原始事故地点P送入所述投影定位模块;In the geocoding module, the accident location information transmitted by the accident location analysis module is converted into corresponding longitude and latitude coordinate points using the geocoding interface, which are recorded as: original accident location P; and the original accident location P is sent to the projection positioning module; 所述投影定位模块基于接收到所述原始事故地点P和所述待处理道路经纬度数据集R,进行道路投影计算,判断待处理数据对应的最终定位点。The projection positioning module performs road projection calculation based on the received original accident location P and the longitude and latitude data set R of the road to be processed, and determines the final positioning point corresponding to the data to be processed. 6.根据权利要求5所述一种基于自然语言处理的交通事故地点定位系统,其特征在于:所述停用词表包括:地点副词、路口、路、路段和处。6. A traffic accident location positioning system based on natural language processing according to claim 5, characterized in that the stop word list includes: location adverbs, intersection, road, section and place. 7.根据权利要求5所述一种基于自然语言处理的交通事故地点定位系统,其特征在于:所述事故地点解析模块提取到所述事故定位用信息后,对事故定位用信息的文本进行标准化处理。7. A traffic accident location location system based on natural language processing according to claim 5, characterized in that after the accident location analysis module extracts the accident location information, it standardizes the text of the accident location information. 8.根据权利要求5所述一种基于自然语言处理的交通事故地点定位系统,其特征在于:所述地理编码模块还包括可信度分析操作,当所述地理编码接口返回所述事故定位用信息对应的经纬度坐标点,将返回数据记作:待判断数据;所述地理编码模块对所述待判断数据进行可信度分析,计算待判断数据的置信度confidence,若置信度大于60,认为编码结果可靠,将该经纬度置为所述原始事故地点P;若否,则细化事故地点文字信息,重新送入所述地理编码接口,直到所述待判断数据的置信度大于60;8. According to claim 5, a traffic accident location location system based on natural language processing is characterized in that: the geocoding module also includes a credibility analysis operation, when the geocoding interface returns the latitude and longitude coordinates corresponding to the accident location information, the returned data is recorded as: data to be judged; the geocoding module performs a credibility analysis on the data to be judged, calculates the confidence of the data to be judged, if the confidence is greater than 60, the coding result is considered reliable, and the longitude and latitude are set as the original accident location P; if not, the accident location text information is refined and re-sent to the geocoding interface until the confidence of the data to be judged is greater than 60; 所述细化事故地点文字信息的操作包括:获取上一次所述地理编码接口的所述事故定位用信息的文本,分别对街道、路和处对应的字段,逐一添加预设的高频地址用词,每次添加一个所述高频地址用词后,送入所述地理编码接口重新定位。The operation of refining the text information of the accident location includes: obtaining the text of the accident location information of the last geocoding interface, adding preset high-frequency address words to the fields corresponding to streets, roads and places respectively, and sending it to the geocoding interface for re-location after adding a high-frequency address word each time.
CN202310961238.XA 2023-08-02 2023-08-02 A method and system for locating traffic accident locations based on natural language processing Active CN116992858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310961238.XA CN116992858B (en) 2023-08-02 2023-08-02 A method and system for locating traffic accident locations based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310961238.XA CN116992858B (en) 2023-08-02 2023-08-02 A method and system for locating traffic accident locations based on natural language processing

Publications (2)

Publication Number Publication Date
CN116992858A CN116992858A (en) 2023-11-03
CN116992858B true CN116992858B (en) 2024-08-30

Family

ID=88524351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310961238.XA Active CN116992858B (en) 2023-08-02 2023-08-02 A method and system for locating traffic accident locations based on natural language processing

Country Status (1)

Country Link
CN (1) CN116992858B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377739A (en) * 2018-11-30 2019-02-22 公安部交通管理科学研究所 A method for obtaining positioning information in traffic accident alarming and asking for help
CN109993971A (en) * 2019-03-27 2019-07-09 江苏智通交通科技有限公司 A method of promoting traffic accident site locating accuracy

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102020122010B4 (en) * 2020-08-24 2023-05-04 Bareways GmbH METHOD AND SYSTEM FOR DETERMINING A CONDITION OF A GEOGRAPHIC LINE
CN112035518B (en) * 2020-08-28 2022-07-15 深圳平安医疗健康科技服务有限公司 Method and device for judging major accident occurrence place and computer equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377739A (en) * 2018-11-30 2019-02-22 公安部交通管理科学研究所 A method for obtaining positioning information in traffic accident alarming and asking for help
CN109993971A (en) * 2019-03-27 2019-07-09 江苏智通交通科技有限公司 A method of promoting traffic accident site locating accuracy

Also Published As

Publication number Publication date
CN116992858A (en) 2023-11-03

Similar Documents

Publication Publication Date Title
EP3971731B1 (en) Fence address-based coordinate data processing method and apparatus, and computer device
US7912879B2 (en) Method for applying clothoid curve values to roadways in a geographic data information system
CN104634352B (en) A kind of road matching method merged based on Floating Car motion track and electronic chart
US20160102987A1 (en) Method for inferring type of road segment
US20060041375A1 (en) Automated georeferencing of digitized map images
CN102918358A (en) A method of resolving a location from data representative thereof
CN111625732A (en) Address matching method and device
Qin et al. Intelligent geocoding system to locate traffic crashes
US20220157167A1 (en) System for offsite navigation
CN112052908B (en) A traffic accident location clustering method and system
CN103514235A (en) Method and device for establishing incremental code library
CN113360587B (en) Land surveying and mapping equipment and method based on GIS technology
Bordes et al. Road modeling based on a cartographic database for aerial image interpretation
Lei Geospatial data conflation: A formal approach based on optimization and relational databases
CN117455032A (en) Road network planning and optimization method and system based on geographical model
CN116992858B (en) A method and system for locating traffic accident locations based on natural language processing
Li et al. An automatic extraction method of coach operation information from historical trajectory data
CN118585832B (en) Vehicle track recognition method, device, equipment, storage medium and program product
Touya Multi-criteria geographic analysis for automated cartographic generalization
Kaushik et al. Coupled approximation of US driving speed and volume statistics using spatial conflation and temporal disaggregation
Fan et al. Lane‐Level Road Map Construction considering Vehicle Lane‐Changing Behavior
Xi et al. Improved dynamic time warping algorithm for bus route trajectory curve fitting
Du et al. A novel semantic recognition framework of urban functional zones supporting urban land structure analytics based on open‐source data
CN113392987A (en) Multi-source data fusion-based bus station position repairing method
CN111914538A (en) Intelligent space matching method and system for channel announcement information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant