CN110807070A - Road condition information extraction method based on neural network - Google Patents
Road condition information extraction method based on neural network Download PDFInfo
- Publication number
- CN110807070A CN110807070A CN201911023161.1A CN201911023161A CN110807070A CN 110807070 A CN110807070 A CN 110807070A CN 201911023161 A CN201911023161 A CN 201911023161A CN 110807070 A CN110807070 A CN 110807070A
- Authority
- CN
- China
- Prior art keywords
- event
- road condition
- information
- poi
- condition information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 23
- 238000000605 extraction Methods 0.000 title claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 18
- 239000013604 expression vector Substances 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 abstract description 3
- 238000013145 classification model Methods 0.000 abstract description 2
- 238000000034 method Methods 0.000 description 8
- 238000012706 support-vector machine Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Traffic Control Systems (AREA)
Abstract
A road condition information extraction method based on a neural network comprises the following steps: the system inputs a section of text and outputs structured road condition information; judging whether the road conditions are contained or not through the classification model, and if the road conditions are not contained, directly ending; extracting POI and EVENT in the text by using an existing POI information base and an EVENT information base; forming POI and EVENT into a candidate pair < entity 1, entity 2, context >; judging candidate pair relations by using a relation model based on a neural network; correlating the related candidate pairs to form complete road condition information; structured traffic information < place, event > is output. The invention reduces the quantity and complexity of the manually constructed features, does not need to introduce the contradiction between new features and processing features during expansion, reduces the maintenance cost, enhances the expansibility and greatly improves the recall rate.
Description
Technical Field
The invention relates to a road condition information extraction method based on a neural network, and belongs to the field of intelligent semantics and voice recognition.
Background
With the economic development, the number of urban vehicles is increased, the problem of urban traffic jam is increasingly serious, the demand of citizens on good traffic road planning is urgent when the citizens go out, and the collection of road traffic information is an important foundation of the urban traffic information. The road traffic information collection is mainly divided into the collection of traffic flow information and the collection of traffic events. The former acquisition method mainly comprises floating car traffic information acquisition, a video monitoring system, an acquisition system based on inductors such as microwaves and radars, client travel information collection in a mobile App mode and the like, and timeliness and accuracy reach higher levels.
However, the latter collection involves various situations such as traffic accidents on roads, road construction, traffic control, natural disasters, and has the characteristics of burstiness, diversity, complexity, and the like, and is difficult to collect by a machine. The current collection mode is mainly collection by user reporting. According to statistics, more than 80% of 2016 real-time traffic dynamic event collection comes from user reporting. In addition, text information obtained by web crawlers (traffic control station websites and microblogs) and traffic broadcast voice translation can be extracted. For structured text, the machine can already handle itself. For unstructured (mostly spoken) texts, such as voice recognition data, microblog text data, and the like, semantic understanding is required to extract correct location description information, event types, occurrence times, and the like.
In recent years, deep neural networks and related technologies thereof have been developed rapidly in image processing, speech recognition, natural language processing, and the like. The information extraction of the road condition information text is a complicated natural language processing problem. The GRU (gated Current Unit) and the Attenttion mechanism in the deep neural network are applied to a series of natural language processing tasks, including Chinese word segmentation, text classification, named entity identification, entity relation extraction, word vector representation and other tasks, so that indexes of related tasks are greatly improved. The invention applies the deep neural network technology in the road condition information extraction system.
The unstructured text data has the advantages of complex and various spoken language description modes, mixed location description and event description of a plurality of pieces of road condition information, need of understanding by combining external geographic information and the like. The methods for extracting information from such texts in the past mainly use artificially constructed rule bases or traditional machine learning methods, such as SVM and the like. They all require a large number of experts to participate in building the rule or feature library, which is costly; the characteristics are closely related to cities, speakers, seasons, specific scenes and the like, and the expansibility is poor; the accuracy of the model is high, but the recall rate of the road condition extraction is low.
Disclosure of Invention
Aiming at the problems in the prior art, the invention adopts a method of combining simple features and a neural network, thereby reducing the number and complexity of artificially constructed features. In order to solve the problems, the invention adopts the following technical scheme:
a road condition information extraction method based on a neural network comprises the following steps:
I. the input of the whole system is a section of text which contains road condition information or does not contain the road condition information, and the input is the structured road condition information;
classifying the input text information by combining keywords, rules and SVM, wherein the classification standard is whether effective road condition information exists or not, and only processing texts containing the road condition information;
extracting POI and EVENT in the text by using the existing geographic position information and road condition EVENT information base to form a complete location information candidate pair < POI1, POI2> and a location information-road condition information candidate pair < POI, EVENT >;
and IV, judging the relation of the candidate pairs in a mode of combining a neural network model, an artificial rule and introduced external information to form complete road condition event information.
The step IV comprises the following specific judgment steps:
A. segmenting words of a text, wherein the segmentation needs to be introduced into a manually constructed keyword library, and the keywords refer to words which have key significance for matching of road segment information and event information in spoken language;
B. and carrying out road section information matching and event information matching through a neural network.
The step B of matching the neural network comprises the following steps:
b1, judging the relationship of the two given entities, namely POI-POI, POI-EVENT and the context where the entities are located, if the two entities are POI, judging whether the two entities form the location description of the same Road condition EVENT, namely, the two entities are Road-Start, Start-End or unrelated; if the two entities are POI and EVENT, judging whether the two entities form a road condition EVENT or not, namely the two entities are related or unrelated;
b2, representing words in the context by words obtained by pre-training large-scale texts, wherein the words obtained by pre-training are represented by word embedding; the pre-trained text comprises wiki encyclopedia and collected road condition information labeling text, the entity to be predicted is replaced by POI and EVENT, two characteristics of site type and EVENT type are introduced to represent the entity to be predicted, and special words with the same meaning are replaced by uniform word vectors;
b3, position encoding is carried out according to the relative position of the words and the entities in the context, namely position encoding;
b4, preprocessing the text and the entity, and using the characteristics of some simple contexts as model input;
b5, connecting the word embedding and position encoding into an input vector;
b6, using Bidirective GRU Encoder to encode the input vector sequence to obtain a context expression vector;
b7, weighting the coded context representation vector by using an Attention mechanism;
b8, connecting the weighted context expression vector with the manually constructed context characteristics;
b9 entity relation classification using a fully connected network.
Through implementation of the technical scheme, the number and complexity of artificial construction features are reduced, for example, the prior art needs about 30 people for artificial design features of four main cities such as Beijing, Shanghai, Shenzhen, Shenyang and the like, the method completely avoids the labor cost, meanwhile, contradictions between new features and processing features are not required to be introduced during expansion, the model maintenance cost is reduced, and the model expansibility is enhanced; on the premise that the accuracy and the traditional machine learning of elaborate characteristic engineering are slightly improved, the recall rate is greatly improved by about 15%.
Drawings
FIG. 1 is a general flow diagram of the system;
FIG. 2 is a detailed flow of the relationship determination model.
Detailed Description
Description of related terms and terms:
POI: the place words comprise geographical description and comprise 3 types of POI (point of interest) such as Road name, Start (Road section starting point) and End (Road section ending point);
event: a description of a traffic event;
when the entity pair is POI-POI, three types of relations including Road-Start (Road-Start), Start-End (Start-End) and irrelevant are existed; if the starting points of the two pairs of entity pairs respectively having the relation of Road-Start and Start-End are the same place, the two pairs of entity pairs are considered to form the relation of Road-Start-End;
when the entity pair is a POI-EVENT, there are two types of relationships, location-EVENT related and location-EVENT unrelated. Specific embodiments of the invention will now be described in detail with reference to the accompanying drawings, in which: a road condition information extraction method based on a neural network mainly comprises the following implementation steps:
v, inputting a section of text which contains or does not contain road condition information into the whole system, and outputting the section of text which is the structured road condition information;
classifying the input text information by using a keyword, a rule and an SVM (support vector machine) in a comprehensive way, wherein the classification standard is whether effective road condition information exists or not, and only texts containing the road condition information are processed; the step mainly aims to solve the problem of false alarm, such as false alarm caused by location information and event information in a text due to wrong speech translation;
extracting POI and EVENT in the text by using the existing geographic position information and road condition EVENT information base to form a complete location information candidate pair < POI1, POI2> and a location information-road condition information candidate pair < POI, EVENT >;
and VIII, judging the relation of the candidate pairs in a mode of combining a neural network model, an artificial rule and introduced external information to form complete road condition event information:
A. segmenting words of a text, wherein the segmentation needs to be introduced into a manually constructed keyword library, and the keywords refer to words which have key significance for matching of road segment information and event information in spoken language, such as separating words and negative words;
B. road section information matching and event information matching are carried out through the neural network shown in the attached figure 2:
b1, judging the relationship for the given two entities, namely POI-POI or POI-EVENT and the context where the two entities are located, if the two entities are POI, judging whether the two entities form the location description of the same Road condition EVENT, namely the two entities are Road-Start, Start-End or unrelated; if the two entities are POI and EVENT, judging whether the two entities form a road condition EVENT or not, namely the two entities are related or unrelated;
b2, representing the words in the context by word embedding obtained by pre-training large-scale texts; the pre-trained text comprises wiki encyclopedia and collected road condition information labeling text, entities (site words and EVENTs) to be predicted are replaced by POI and EVENT, two characteristics of site types and EVENT types are introduced to represent the entities to be predicted, special words with the same meaning are replaced by unified word vectors, and the entities represent the starting and ending point relations of road sections, such as 'arrival', 'queue up' and 'queue tail' between the entities; the entity rear represents roads, junctions of nodes, sentry boxes, traffic lights and the like, and the purpose of the method is to solve the problem of unknown words caused by simple names and the like;
b3, position encoding is carried out according to the relative position of the words and the entities in the context, namely position encoding;
b4, preprocessing the text and the entity, and using the characteristics of some simple contexts as model input; the characteristics that whether two geographic description entities are road nodes or not, the distance between the entities, the position sequence relation of the entities, the places and the event number between the entities, whether separation words exist between the entities or not and the like which do not need complex design are obtained according to the map nodes;
b5, connecting the word embedding and position encoding into an input vector;
b6, using Bidirective GRU Encoder to encode the input vector sequence to obtain a context expression vector;
b7, weighting the coded context representation vector by using an Attention mechanism;
b8, connecting the weighted context expression vector with the manually constructed context characteristics;
b9 entity relation classification using a fully connected network.
The implementation steps are as follows with reference to the attached figure 1:
1) the system inputs a section of text and outputs structured road condition information;
2) judging whether the road conditions are contained or not through a classification model, and if the road conditions are not contained, directly ending without extraction;
3) extracting POI and EVENT in the text by using an existing POI information base and an EVENT information base;
4) forming POI and EVENT into a plurality of relation candidate pairs < entity 1, entity 2 and context >;
5) determining candidate pair relationships using a neural network-based relationship model: if the entities 1 and 2 are POIs, judging whether start-end and road-start are irrelevant; if the entity 1 is POI and the entity 2 is EVENT, judging whether the place-road condition relation is related or unrelated;
6) correlating the related candidate pairs to form complete road condition information;
7) structured traffic information < place, event > is output.
The implementation process is shown in the attached figure 2:
1) inputting a relationship candidate pair of < entity 1, entity 2, context >;
2) performing Chinese word segmentation on the context;
3) representing the words using pre-trained word vectors;
4) preprocessing words with special meanings;
5) according to the relative position of the words in the context, carrying out position coding on each word to form a position vector;
6) connecting the word vector and the position vector into an input vector of a neural network;
7) generating a representation of a context using a Bi-directional GRU (Bi-GRU) and an Attention mechanism;
8) combining feature information (such as whether two POIs have intersection points on a map) and the like with the context representation to form a final representation of a relationship pair;
9) classifying the final representation of the relationship pair using a fully connected neural network;
10) and outputting a classification result.
Through the implementation of the technical scheme of the invention, the following technical effects are brought:
1) besides the need of externally introduced information, compared with the traditional method, the model based on the neural network does not need to introduce complicated rules or feature libraries, and the cost is reduced.
Typical features used in rule-based and traditional machine learning include, but are not limited to, the following:
the present invention reduces the required features to the use of only the following basic features:
1 | whether two POIs are nodes or not |
2 | Distance of two POIs |
3 | Number of POIs between two POIs |
4 | Number of EVENTs between two POIs |
5 | POI types |
2) If expansion is needed in the aspects of cities, data sources and the like, only new training data needs to be collected to train the model, so that the problems of conflict and maintenance of new and old rule characteristics are solved, and the expansibility is enhanced;
3) the system disclosed by the invention can greatly improve the recall rate of road condition information extraction and improve the recall rate by about 15% on the premise of ensuring the accuracy rate to be similar to that of other methods.
Claims (3)
1. A road condition information extraction method based on a neural network comprises the following steps:
I. the input of the whole system is a section of text which contains road condition information or does not contain the road condition information, and the input is the structured road condition information;
classifying the input text information by combining keywords, rules and SVM, wherein the classification standard is whether effective road condition information exists or not, and only processing texts containing the road condition information;
extracting POI and EVENT in the text by using the existing geographic position information and road condition EVENT information base to form a complete location information candidate pair < POI1, POI2> and a location information-road condition information candidate pair < POI, EVENT >;
and IV, judging the relation of the candidate pairs in a mode of combining a neural network model, an artificial rule and introduced external information to form complete road condition event information.
2. The road condition information extraction method based on the neural network as claimed in claim 1, wherein: the step IV comprises the following specific judgment steps:
A. segmenting words of a text, wherein the segmentation needs to be introduced into a manually constructed keyword library, and the keywords refer to words which have key significance for matching of road segment information and event information in spoken language;
B. and carrying out road section information matching and event information matching through a neural network.
3. The road condition information extraction method based on the neural network as claimed in claim 2, wherein: the step B of matching the neural network comprises the following steps:
b1, judging the relationship of the two given entities, namely POI-POI, POI-EVENT and the context where the entities are located, if the two entities are POI, judging whether the two entities form the location description of the same Road condition EVENT, namely, the two entities are Road-Start, Start-End or unrelated; if the two entities are POI and EVENT, judging whether the two entities form a road condition EVENT or not, namely the two entities are related or unrelated;
b2, representing words in the context by words obtained by pre-training large-scale texts, wherein the words obtained by pre-training are represented by word embedding; the pre-trained text comprises wiki encyclopedia and collected road condition information labeling text, the entity to be predicted is replaced by POI and EVENT, two characteristics of site type and EVENT type are introduced to represent the entity to be predicted, and special words with the same meaning are replaced by uniform word vectors;
b3 position coding, i.e. position, based on the relative position of words and entities in context
encoding;
B4, preprocessing the text and the entity, and using the characteristics of some simple contexts as model input;
b5, connecting the word embedding and position encoding into an input vector;
b6, using Bidirective GRU Encoder to encode the input vector sequence to obtain a context expression vector;
b7, weighting the coded context representation vector by using an Attention mechanism;
b8, connecting the weighted context expression vector with the manually constructed context characteristics;
b9 entity relation classification using a fully connected network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911023161.1A CN110807070A (en) | 2019-10-25 | 2019-10-25 | Road condition information extraction method based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911023161.1A CN110807070A (en) | 2019-10-25 | 2019-10-25 | Road condition information extraction method based on neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110807070A true CN110807070A (en) | 2020-02-18 |
Family
ID=69489108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911023161.1A Pending CN110807070A (en) | 2019-10-25 | 2019-10-25 | Road condition information extraction method based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110807070A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106504746A (en) * | 2016-10-28 | 2017-03-15 | 普强信息技术(北京)有限公司 | A kind of method for extracting structuring traffic information from speech data |
JP2017208045A (en) * | 2016-05-20 | 2017-11-24 | 日本電信電話株式会社 | Characteristic understanding device, method, and program |
US20180196881A1 (en) * | 2017-01-06 | 2018-07-12 | Microsoft Technology Licensing, Llc | Domain review system for identifying entity relationships and corresponding insights |
CN108875007A (en) * | 2018-06-15 | 2018-11-23 | 腾讯科技(深圳)有限公司 | The determination method and apparatus of point of interest, storage medium, electronic device |
CN109902145A (en) * | 2019-01-18 | 2019-06-18 | 中国科学院信息工程研究所 | A method and system for joint entity relation extraction based on attention mechanism |
-
2019
- 2019-10-25 CN CN201911023161.1A patent/CN110807070A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017208045A (en) * | 2016-05-20 | 2017-11-24 | 日本電信電話株式会社 | Characteristic understanding device, method, and program |
CN106504746A (en) * | 2016-10-28 | 2017-03-15 | 普强信息技术(北京)有限公司 | A kind of method for extracting structuring traffic information from speech data |
US20180196881A1 (en) * | 2017-01-06 | 2018-07-12 | Microsoft Technology Licensing, Llc | Domain review system for identifying entity relationships and corresponding insights |
CN108875007A (en) * | 2018-06-15 | 2018-11-23 | 腾讯科技(深圳)有限公司 | The determination method and apparatus of point of interest, storage medium, electronic device |
CN109902145A (en) * | 2019-01-18 | 2019-06-18 | 中国科学院信息工程研究所 | A method and system for joint entity relation extraction based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
马语丹 等: "结合实体共现信息与句子语义特征的关系抽取方法", 中国科学:信息科学 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106504746B (en) | Method for extracting structured traffic road condition information from voice data | |
CN100573506C (en) | A kind of space-time fusion method of natural language expressing dynamic traffic information | |
CN111524353B (en) | A method of traffic text data for speed prediction and trip planning | |
CN105243128A (en) | Sign-in data based user behavior trajectory clustering method | |
CN116108169B (en) | Hot wire work order intelligent dispatching method based on knowledge graph | |
CN111931998B (en) | A method and system for predicting individual travel patterns based on mobile positioning data | |
CN110807552A (en) | A construction method of urban electric bus driving conditions based on improved K-means | |
CN114266316B (en) | Hierarchical graph convolutional network-based carbon footprint-user classification method | |
CN114896523B (en) | Road planning method and device based on country tourism line | |
CN115017425B (en) | Location search method, location search device, electronic device, and storage medium | |
CN113159403A (en) | Method and device for predicting pedestrian track at intersection | |
CN116824868B (en) | Method, device, equipment and medium for identifying illegal parking points and predicting congestion of vehicles | |
CN115565376B (en) | Vehicle journey time prediction method and system integrating graph2vec and double-layer LSTM | |
CN115100395B (en) | A method for urban block function classification integrating POI pre-classification and graph neural network | |
CN113495929B (en) | Triplet extraction method based on self-attention | |
CN117827863B (en) | Atmospheric environment monitoring and analysis method and system based on CLDAS database | |
CN111678531B (en) | Subway path planning method based on LightGBM | |
CN111444286B (en) | Long-distance traffic node relevance mining method based on trajectory data | |
CN113159371A (en) | Unknown target feature modeling and demand prediction method based on cross-modal data fusion | |
CN118247953A (en) | Traffic flow prediction method and device by combining rainfall and space-time diagram convolution model | |
CN116127096A (en) | A Construction Method of Traffic Knowledge Graph Based on Multi-source Data Fusion | |
CN110807070A (en) | Road condition information extraction method based on neural network | |
CN116484859A (en) | Police condition space position positioning method and related products | |
CN117520672A (en) | Attention mechanism-based basic layer data original address and standard address association method | |
CN115907012A (en) | A Data Mining Method Based on Power Supply Service Information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200218 |
|
WD01 | Invention patent application deemed withdrawn after publication |