[go: up one dir, main page]

CN110807070A - Road condition information extraction method based on neural network - Google Patents

Road condition information extraction method based on neural network Download PDF

Info

Publication number
CN110807070A
CN110807070A CN201911023161.1A CN201911023161A CN110807070A CN 110807070 A CN110807070 A CN 110807070A CN 201911023161 A CN201911023161 A CN 201911023161A CN 110807070 A CN110807070 A CN 110807070A
Authority
CN
China
Prior art keywords
event
road condition
information
poi
condition information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911023161.1A
Other languages
Chinese (zh)
Inventor
杨喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puqiang Information Technology (beijing) Co Ltd
Original Assignee
Puqiang Information Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puqiang Information Technology (beijing) Co Ltd filed Critical Puqiang Information Technology (beijing) Co Ltd
Priority to CN201911023161.1A priority Critical patent/CN110807070A/en
Publication of CN110807070A publication Critical patent/CN110807070A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Traffic Control Systems (AREA)

Abstract

A road condition information extraction method based on a neural network comprises the following steps: the system inputs a section of text and outputs structured road condition information; judging whether the road conditions are contained or not through the classification model, and if the road conditions are not contained, directly ending; extracting POI and EVENT in the text by using an existing POI information base and an EVENT information base; forming POI and EVENT into a candidate pair < entity 1, entity 2, context >; judging candidate pair relations by using a relation model based on a neural network; correlating the related candidate pairs to form complete road condition information; structured traffic information < place, event > is output. The invention reduces the quantity and complexity of the manually constructed features, does not need to introduce the contradiction between new features and processing features during expansion, reduces the maintenance cost, enhances the expansibility and greatly improves the recall rate.

Description

Road condition information extraction method based on neural network
Technical Field
The invention relates to a road condition information extraction method based on a neural network, and belongs to the field of intelligent semantics and voice recognition.
Background
With the economic development, the number of urban vehicles is increased, the problem of urban traffic jam is increasingly serious, the demand of citizens on good traffic road planning is urgent when the citizens go out, and the collection of road traffic information is an important foundation of the urban traffic information. The road traffic information collection is mainly divided into the collection of traffic flow information and the collection of traffic events. The former acquisition method mainly comprises floating car traffic information acquisition, a video monitoring system, an acquisition system based on inductors such as microwaves and radars, client travel information collection in a mobile App mode and the like, and timeliness and accuracy reach higher levels.
However, the latter collection involves various situations such as traffic accidents on roads, road construction, traffic control, natural disasters, and has the characteristics of burstiness, diversity, complexity, and the like, and is difficult to collect by a machine. The current collection mode is mainly collection by user reporting. According to statistics, more than 80% of 2016 real-time traffic dynamic event collection comes from user reporting. In addition, text information obtained by web crawlers (traffic control station websites and microblogs) and traffic broadcast voice translation can be extracted. For structured text, the machine can already handle itself. For unstructured (mostly spoken) texts, such as voice recognition data, microblog text data, and the like, semantic understanding is required to extract correct location description information, event types, occurrence times, and the like.
In recent years, deep neural networks and related technologies thereof have been developed rapidly in image processing, speech recognition, natural language processing, and the like. The information extraction of the road condition information text is a complicated natural language processing problem. The GRU (gated Current Unit) and the Attenttion mechanism in the deep neural network are applied to a series of natural language processing tasks, including Chinese word segmentation, text classification, named entity identification, entity relation extraction, word vector representation and other tasks, so that indexes of related tasks are greatly improved. The invention applies the deep neural network technology in the road condition information extraction system.
The unstructured text data has the advantages of complex and various spoken language description modes, mixed location description and event description of a plurality of pieces of road condition information, need of understanding by combining external geographic information and the like. The methods for extracting information from such texts in the past mainly use artificially constructed rule bases or traditional machine learning methods, such as SVM and the like. They all require a large number of experts to participate in building the rule or feature library, which is costly; the characteristics are closely related to cities, speakers, seasons, specific scenes and the like, and the expansibility is poor; the accuracy of the model is high, but the recall rate of the road condition extraction is low.
Disclosure of Invention
Aiming at the problems in the prior art, the invention adopts a method of combining simple features and a neural network, thereby reducing the number and complexity of artificially constructed features. In order to solve the problems, the invention adopts the following technical scheme:
a road condition information extraction method based on a neural network comprises the following steps:
I. the input of the whole system is a section of text which contains road condition information or does not contain the road condition information, and the input is the structured road condition information;
classifying the input text information by combining keywords, rules and SVM, wherein the classification standard is whether effective road condition information exists or not, and only processing texts containing the road condition information;
extracting POI and EVENT in the text by using the existing geographic position information and road condition EVENT information base to form a complete location information candidate pair < POI1, POI2> and a location information-road condition information candidate pair < POI, EVENT >;
and IV, judging the relation of the candidate pairs in a mode of combining a neural network model, an artificial rule and introduced external information to form complete road condition event information.
The step IV comprises the following specific judgment steps:
A. segmenting words of a text, wherein the segmentation needs to be introduced into a manually constructed keyword library, and the keywords refer to words which have key significance for matching of road segment information and event information in spoken language;
B. and carrying out road section information matching and event information matching through a neural network.
The step B of matching the neural network comprises the following steps:
b1, judging the relationship of the two given entities, namely POI-POI, POI-EVENT and the context where the entities are located, if the two entities are POI, judging whether the two entities form the location description of the same Road condition EVENT, namely, the two entities are Road-Start, Start-End or unrelated; if the two entities are POI and EVENT, judging whether the two entities form a road condition EVENT or not, namely the two entities are related or unrelated;
b2, representing words in the context by words obtained by pre-training large-scale texts, wherein the words obtained by pre-training are represented by word embedding; the pre-trained text comprises wiki encyclopedia and collected road condition information labeling text, the entity to be predicted is replaced by POI and EVENT, two characteristics of site type and EVENT type are introduced to represent the entity to be predicted, and special words with the same meaning are replaced by uniform word vectors;
b3, position encoding is carried out according to the relative position of the words and the entities in the context, namely position encoding;
b4, preprocessing the text and the entity, and using the characteristics of some simple contexts as model input;
b5, connecting the word embedding and position encoding into an input vector;
b6, using Bidirective GRU Encoder to encode the input vector sequence to obtain a context expression vector;
b7, weighting the coded context representation vector by using an Attention mechanism;
b8, connecting the weighted context expression vector with the manually constructed context characteristics;
b9 entity relation classification using a fully connected network.
Through implementation of the technical scheme, the number and complexity of artificial construction features are reduced, for example, the prior art needs about 30 people for artificial design features of four main cities such as Beijing, Shanghai, Shenzhen, Shenyang and the like, the method completely avoids the labor cost, meanwhile, contradictions between new features and processing features are not required to be introduced during expansion, the model maintenance cost is reduced, and the model expansibility is enhanced; on the premise that the accuracy and the traditional machine learning of elaborate characteristic engineering are slightly improved, the recall rate is greatly improved by about 15%.
Drawings
FIG. 1 is a general flow diagram of the system;
FIG. 2 is a detailed flow of the relationship determination model.
Detailed Description
Description of related terms and terms:
POI: the place words comprise geographical description and comprise 3 types of POI (point of interest) such as Road name, Start (Road section starting point) and End (Road section ending point);
event: a description of a traffic event;
when the entity pair is POI-POI, three types of relations including Road-Start (Road-Start), Start-End (Start-End) and irrelevant are existed; if the starting points of the two pairs of entity pairs respectively having the relation of Road-Start and Start-End are the same place, the two pairs of entity pairs are considered to form the relation of Road-Start-End;
when the entity pair is a POI-EVENT, there are two types of relationships, location-EVENT related and location-EVENT unrelated. Specific embodiments of the invention will now be described in detail with reference to the accompanying drawings, in which: a road condition information extraction method based on a neural network mainly comprises the following implementation steps:
v, inputting a section of text which contains or does not contain road condition information into the whole system, and outputting the section of text which is the structured road condition information;
classifying the input text information by using a keyword, a rule and an SVM (support vector machine) in a comprehensive way, wherein the classification standard is whether effective road condition information exists or not, and only texts containing the road condition information are processed; the step mainly aims to solve the problem of false alarm, such as false alarm caused by location information and event information in a text due to wrong speech translation;
extracting POI and EVENT in the text by using the existing geographic position information and road condition EVENT information base to form a complete location information candidate pair < POI1, POI2> and a location information-road condition information candidate pair < POI, EVENT >;
and VIII, judging the relation of the candidate pairs in a mode of combining a neural network model, an artificial rule and introduced external information to form complete road condition event information:
A. segmenting words of a text, wherein the segmentation needs to be introduced into a manually constructed keyword library, and the keywords refer to words which have key significance for matching of road segment information and event information in spoken language, such as separating words and negative words;
B. road section information matching and event information matching are carried out through the neural network shown in the attached figure 2:
b1, judging the relationship for the given two entities, namely POI-POI or POI-EVENT and the context where the two entities are located, if the two entities are POI, judging whether the two entities form the location description of the same Road condition EVENT, namely the two entities are Road-Start, Start-End or unrelated; if the two entities are POI and EVENT, judging whether the two entities form a road condition EVENT or not, namely the two entities are related or unrelated;
b2, representing the words in the context by word embedding obtained by pre-training large-scale texts; the pre-trained text comprises wiki encyclopedia and collected road condition information labeling text, entities (site words and EVENTs) to be predicted are replaced by POI and EVENT, two characteristics of site types and EVENT types are introduced to represent the entities to be predicted, special words with the same meaning are replaced by unified word vectors, and the entities represent the starting and ending point relations of road sections, such as 'arrival', 'queue up' and 'queue tail' between the entities; the entity rear represents roads, junctions of nodes, sentry boxes, traffic lights and the like, and the purpose of the method is to solve the problem of unknown words caused by simple names and the like;
b3, position encoding is carried out according to the relative position of the words and the entities in the context, namely position encoding;
b4, preprocessing the text and the entity, and using the characteristics of some simple contexts as model input; the characteristics that whether two geographic description entities are road nodes or not, the distance between the entities, the position sequence relation of the entities, the places and the event number between the entities, whether separation words exist between the entities or not and the like which do not need complex design are obtained according to the map nodes;
b5, connecting the word embedding and position encoding into an input vector;
b6, using Bidirective GRU Encoder to encode the input vector sequence to obtain a context expression vector;
b7, weighting the coded context representation vector by using an Attention mechanism;
b8, connecting the weighted context expression vector with the manually constructed context characteristics;
b9 entity relation classification using a fully connected network.
The implementation steps are as follows with reference to the attached figure 1:
1) the system inputs a section of text and outputs structured road condition information;
2) judging whether the road conditions are contained or not through a classification model, and if the road conditions are not contained, directly ending without extraction;
3) extracting POI and EVENT in the text by using an existing POI information base and an EVENT information base;
4) forming POI and EVENT into a plurality of relation candidate pairs < entity 1, entity 2 and context >;
5) determining candidate pair relationships using a neural network-based relationship model: if the entities 1 and 2 are POIs, judging whether start-end and road-start are irrelevant; if the entity 1 is POI and the entity 2 is EVENT, judging whether the place-road condition relation is related or unrelated;
6) correlating the related candidate pairs to form complete road condition information;
7) structured traffic information < place, event > is output.
The implementation process is shown in the attached figure 2:
1) inputting a relationship candidate pair of < entity 1, entity 2, context >;
2) performing Chinese word segmentation on the context;
3) representing the words using pre-trained word vectors;
4) preprocessing words with special meanings;
5) according to the relative position of the words in the context, carrying out position coding on each word to form a position vector;
6) connecting the word vector and the position vector into an input vector of a neural network;
7) generating a representation of a context using a Bi-directional GRU (Bi-GRU) and an Attention mechanism;
8) combining feature information (such as whether two POIs have intersection points on a map) and the like with the context representation to form a final representation of a relationship pair;
9) classifying the final representation of the relationship pair using a fully connected neural network;
10) and outputting a classification result.
Through the implementation of the technical scheme of the invention, the following technical effects are brought:
1) besides the need of externally introduced information, compared with the traditional method, the model based on the neural network does not need to introduce complicated rules or feature libraries, and the cost is reduced.
Typical features used in rule-based and traditional machine learning include, but are not limited to, the following:
Figure BDA0002247859950000051
Figure BDA0002247859950000061
the present invention reduces the required features to the use of only the following basic features:
1 whether two POIs are nodes or not
2 Distance of two POIs
3 Number of POIs between two POIs
4 Number of EVENTs between two POIs
5 POI types
2) If expansion is needed in the aspects of cities, data sources and the like, only new training data needs to be collected to train the model, so that the problems of conflict and maintenance of new and old rule characteristics are solved, and the expansibility is enhanced;
3) the system disclosed by the invention can greatly improve the recall rate of road condition information extraction and improve the recall rate by about 15% on the premise of ensuring the accuracy rate to be similar to that of other methods.

Claims (3)

1. A road condition information extraction method based on a neural network comprises the following steps:
I. the input of the whole system is a section of text which contains road condition information or does not contain the road condition information, and the input is the structured road condition information;
classifying the input text information by combining keywords, rules and SVM, wherein the classification standard is whether effective road condition information exists or not, and only processing texts containing the road condition information;
extracting POI and EVENT in the text by using the existing geographic position information and road condition EVENT information base to form a complete location information candidate pair < POI1, POI2> and a location information-road condition information candidate pair < POI, EVENT >;
and IV, judging the relation of the candidate pairs in a mode of combining a neural network model, an artificial rule and introduced external information to form complete road condition event information.
2. The road condition information extraction method based on the neural network as claimed in claim 1, wherein: the step IV comprises the following specific judgment steps:
A. segmenting words of a text, wherein the segmentation needs to be introduced into a manually constructed keyword library, and the keywords refer to words which have key significance for matching of road segment information and event information in spoken language;
B. and carrying out road section information matching and event information matching through a neural network.
3. The road condition information extraction method based on the neural network as claimed in claim 2, wherein: the step B of matching the neural network comprises the following steps:
b1, judging the relationship of the two given entities, namely POI-POI, POI-EVENT and the context where the entities are located, if the two entities are POI, judging whether the two entities form the location description of the same Road condition EVENT, namely, the two entities are Road-Start, Start-End or unrelated; if the two entities are POI and EVENT, judging whether the two entities form a road condition EVENT or not, namely the two entities are related or unrelated;
b2, representing words in the context by words obtained by pre-training large-scale texts, wherein the words obtained by pre-training are represented by word embedding; the pre-trained text comprises wiki encyclopedia and collected road condition information labeling text, the entity to be predicted is replaced by POI and EVENT, two characteristics of site type and EVENT type are introduced to represent the entity to be predicted, and special words with the same meaning are replaced by uniform word vectors;
b3 position coding, i.e. position, based on the relative position of words and entities in context
encoding;
B4, preprocessing the text and the entity, and using the characteristics of some simple contexts as model input;
b5, connecting the word embedding and position encoding into an input vector;
b6, using Bidirective GRU Encoder to encode the input vector sequence to obtain a context expression vector;
b7, weighting the coded context representation vector by using an Attention mechanism;
b8, connecting the weighted context expression vector with the manually constructed context characteristics;
b9 entity relation classification using a fully connected network.
CN201911023161.1A 2019-10-25 2019-10-25 Road condition information extraction method based on neural network Pending CN110807070A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911023161.1A CN110807070A (en) 2019-10-25 2019-10-25 Road condition information extraction method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911023161.1A CN110807070A (en) 2019-10-25 2019-10-25 Road condition information extraction method based on neural network

Publications (1)

Publication Number Publication Date
CN110807070A true CN110807070A (en) 2020-02-18

Family

ID=69489108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911023161.1A Pending CN110807070A (en) 2019-10-25 2019-10-25 Road condition information extraction method based on neural network

Country Status (1)

Country Link
CN (1) CN110807070A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504746A (en) * 2016-10-28 2017-03-15 普强信息技术(北京)有限公司 A kind of method for extracting structuring traffic information from speech data
JP2017208045A (en) * 2016-05-20 2017-11-24 日本電信電話株式会社 Characteristic understanding device, method, and program
US20180196881A1 (en) * 2017-01-06 2018-07-12 Microsoft Technology Licensing, Llc Domain review system for identifying entity relationships and corresponding insights
CN108875007A (en) * 2018-06-15 2018-11-23 腾讯科技(深圳)有限公司 The determination method and apparatus of point of interest, storage medium, electronic device
CN109902145A (en) * 2019-01-18 2019-06-18 中国科学院信息工程研究所 A method and system for joint entity relation extraction based on attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017208045A (en) * 2016-05-20 2017-11-24 日本電信電話株式会社 Characteristic understanding device, method, and program
CN106504746A (en) * 2016-10-28 2017-03-15 普强信息技术(北京)有限公司 A kind of method for extracting structuring traffic information from speech data
US20180196881A1 (en) * 2017-01-06 2018-07-12 Microsoft Technology Licensing, Llc Domain review system for identifying entity relationships and corresponding insights
CN108875007A (en) * 2018-06-15 2018-11-23 腾讯科技(深圳)有限公司 The determination method and apparatus of point of interest, storage medium, electronic device
CN109902145A (en) * 2019-01-18 2019-06-18 中国科学院信息工程研究所 A method and system for joint entity relation extraction based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马语丹 等: "结合实体共现信息与句子语义特征的关系抽取方法", 中国科学:信息科学 *

Similar Documents

Publication Publication Date Title
CN106504746B (en) Method for extracting structured traffic road condition information from voice data
CN100573506C (en) A kind of space-time fusion method of natural language expressing dynamic traffic information
CN111524353B (en) A method of traffic text data for speed prediction and trip planning
CN105243128A (en) Sign-in data based user behavior trajectory clustering method
CN116108169B (en) Hot wire work order intelligent dispatching method based on knowledge graph
CN111931998B (en) A method and system for predicting individual travel patterns based on mobile positioning data
CN110807552A (en) A construction method of urban electric bus driving conditions based on improved K-means
CN114266316B (en) Hierarchical graph convolutional network-based carbon footprint-user classification method
CN114896523B (en) Road planning method and device based on country tourism line
CN115017425B (en) Location search method, location search device, electronic device, and storage medium
CN113159403A (en) Method and device for predicting pedestrian track at intersection
CN116824868B (en) Method, device, equipment and medium for identifying illegal parking points and predicting congestion of vehicles
CN115565376B (en) Vehicle journey time prediction method and system integrating graph2vec and double-layer LSTM
CN115100395B (en) A method for urban block function classification integrating POI pre-classification and graph neural network
CN113495929B (en) Triplet extraction method based on self-attention
CN117827863B (en) Atmospheric environment monitoring and analysis method and system based on CLDAS database
CN111678531B (en) Subway path planning method based on LightGBM
CN111444286B (en) Long-distance traffic node relevance mining method based on trajectory data
CN113159371A (en) Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN118247953A (en) Traffic flow prediction method and device by combining rainfall and space-time diagram convolution model
CN116127096A (en) A Construction Method of Traffic Knowledge Graph Based on Multi-source Data Fusion
CN110807070A (en) Road condition information extraction method based on neural network
CN116484859A (en) Police condition space position positioning method and related products
CN117520672A (en) Attention mechanism-based basic layer data original address and standard address association method
CN115907012A (en) A Data Mining Method Based on Power Supply Service Information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200218

WD01 Invention patent application deemed withdrawn after publication