[go: up one dir, main page]

CN113157829B - Method, device, electronic device and storage medium for comparing names of points of interest - Google Patents

Method, device, electronic device and storage medium for comparing names of points of interest Download PDF

Info

Publication number
CN113157829B
CN113157829B CN202010074812.6A CN202010074812A CN113157829B CN 113157829 B CN113157829 B CN 113157829B CN 202010074812 A CN202010074812 A CN 202010074812A CN 113157829 B CN113157829 B CN 113157829B
Authority
CN
China
Prior art keywords
name
interest
point
similarity
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010074812.6A
Other languages
Chinese (zh)
Other versions
CN113157829A (en
Inventor
宋宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010074812.6A priority Critical patent/CN113157829B/en
Publication of CN113157829A publication Critical patent/CN113157829A/en
Application granted granted Critical
Publication of CN113157829B publication Critical patent/CN113157829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种兴趣点名称比对方法、装置、电子设备和存储介质,涉及人工智能技术领域。具体实现方案为:获取待比对的第一兴趣点名称和第二兴趣点名称;根据与第一兴趣点名称和第二兴趣点名称分别对应的语义表示特征,计算第一兴趣点名称与第二兴趣点名称之间的相似度;其中,所述语义表示特征中包括:兴趣点名称语义相似度特征,以及至少一个兴趣点名称关联语义特征。本申请实施例根据兴趣点名称语义相似度特征和至少一个兴趣点名称关联语义特征,计算两个兴趣点名称的相似度,提升了计算兴趣点名称相似度的准确性。

The present application discloses a method, device, electronic device and storage medium for comparing points of interest names, and relates to the field of artificial intelligence technology. The specific implementation scheme is: obtaining the first point of interest name and the second point of interest name to be compared; calculating the similarity between the first point of interest name and the second point of interest name according to the semantic representation features corresponding to the first point of interest name and the second point of interest name respectively; wherein the semantic representation features include: point of interest name semantic similarity features, and at least one point of interest name associated semantic feature. The embodiment of the present application calculates the similarity of two point of interest names based on the point of interest name semantic similarity features and at least one point of interest name associated semantic feature, thereby improving the accuracy of calculating the similarity of the point of interest names.

Description

Point-of-interest name comparison method, device, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of Internet, in particular to the technical field of artificial intelligence, and specifically relates to a method, a device, electronic equipment and a storage medium for comparing interest point names.
Background
When processing massive map interest point information data, whether different information data describe the same interest point needs to be judged. The name similarity among different interest points is the most important judgment basis, and the accuracy of the name similarity calculation directly influences the accuracy of the subsequent calculation flow.
The similarity calculation of map interest point names currently adopts the following two schemes:
(1) A deep semantic matching method based on a two-way LSTM (Long Short-Term Memory) model comprises the following steps: the scheme uses a deep neural network based on a bidirectional LSTM to perform name similarity calculation to obtain a similarity probability (0-1 continuous value, 0 represents dissimilarity, and 1 represents similarity); (2) Depth semantic matching method based on Attention: the scheme uses the attribute idea of a transducer framework to construct a deep neural network for name similarity calculation, and a similarity probability (0-1 continuous value, 0 represents dissimilarity and 1 represents similarity) is obtained.
However, both of these approaches still suffer from certain drawbacks: in the scheme (1), when the similarity of names of interest points of two maps is calculated, the forward and reverse problems exist, namely when the front and back sequences of the two compared names are exchanged, the similarity of the names calculated by the model is greatly different; the scheme (2) can solve the problem of the scheme (1), but has a certain similarity to the names of the points of interest, but belongs to the situation of different industries, and the similarity calculation is inaccurate.
Disclosure of Invention
The embodiment of the application discloses a method, a device, electronic equipment and a medium for comparing the names of interest points, so as to achieve the aim of accurately calculating the similarity of the names of the interest points.
In a first aspect, an embodiment of the present application discloses a method for comparing interest point names, including:
acquiring a first interest point name and a second interest point name to be compared;
according to semantic representation features respectively corresponding to the first interest point name and the second interest point name, calculating the similarity between the first interest point name and the second interest point name;
Wherein the semantic representation features include: the interest point name semantic similarity features and at least one interest point name associated semantic feature;
The interest point name semantic similarity features are differentiated semantic features between the interest point names and other interest point names, and the interest point name associated semantic features are semantic features of other attributes associated with the interest point names.
One embodiment of the above application has the following advantages or benefits: meanwhile, according to the similarity characteristics of the interest point names and the at least one associated semantic characteristics of the interest point names, the similarity of the two interest point names is calculated, and compared with the fact that the similarity of the two interest point names is calculated only according to the characteristic representation of the interest point names, the accuracy of calculating the similarity of the interest point names is improved.
In addition, the method for comparing the names of the interest points according to the embodiment of the application can also have the following additional technical features:
optionally, the semantic feature related to the interest point name includes: industry attribute characteristics of the interest point names and/or word attribute characteristics of each word in the interest point names;
The industry attribute features of the interest point names are semantic features of industry attributes of the industries to which the interest point names belong, and the word attribute features of the segmentation words in the interest point names are semantic features of word attributes to which the segmentation words in the interest point names belong.
One embodiment of the above application has the following advantages or benefits: when the similarity of the two interest point names is determined, the accuracy of similarity recognition can be further improved by combining the industry attribute characteristics of the interest point names and/or the word attribute characteristics of each word in the interest point names.
Optionally, calculating the similarity between the first interest point name and the second interest point name according to semantic representation features corresponding to the first interest point name and the second interest point name respectively includes:
Respectively inputting the first interest point name and the second interest point name into a pre-trained multi-task learning model;
obtaining the similarity between the first interest point name and the second interest point name output by the multi-task learning model;
the multi-task learning model includes: the interest point name similarity recognition sub-model and at least one interest point name association feature recognition sub-model are arranged in parallel; the interest point name similarity recognition sub-model and each interest point name association characteristic recognition sub-model share the same semantic representation layer;
The semantic representation layer is used for acquiring the semantic representation features respectively corresponding to the first interest point name and the second interest point name; the interest point name similarity recognition sub-model is used for outputting similarity between the first interest point name and the second interest point name.
One embodiment of the above application has the following advantages or benefits: the semantic representation layers in the plurality of single-task deep neural networks are subjected to parameter sharing, the plurality of single-task neural networks are combined into a multi-task neural network learning model, and the semantic representation layers of the parameter sharing can learn the semantic expression of the deeper layers of the interest points, so that the accuracy of the similarity identification of the interest points can be ensured.
Optionally, before acquiring the first interest point name and the second interest point name to be compared, the method further includes:
Obtaining training samples which are respectively matched with the interest point name similarity recognition sub-model and the interest point name association characteristic recognition sub-model;
And training the multi-task learning model by using the training sample, and obtaining the trained multi-task learning model when the training ending condition is met.
One embodiment of the above application has the following advantages or benefits: and training the multi-task model by utilizing different training samples, so that the shared semantic representation layer can learn the semantic expression of the interest point deeper layer, and further, the accuracy of the similarity recognition of the interest point of the trained multi-task model is ensured.
Optionally, the interest point name similarity recognition sub-model specifically includes: the semantic representation layer and the cross-attention comparison layer;
The semantic representation layer is used for acquiring semantic representation features corresponding to the two input interest point names respectively, and inputting the semantic representation features to the cross attention comparison layer for similarity comparison;
The cross attention comparison layer is used for outputting the similarity between the two interest point names.
Optionally, the interest point name associated feature recognition sub-model includes an interest point name industry classification sub-model; the interest point name associated feature recognition sub-model comprises: the semantic representation layer and the full-connection classification layer;
the semantic representation layer is used for acquiring semantic representation features corresponding to the input interest point names, and inputting the semantic representation features into the fully-connected classification layer for industry classification;
and the full-connection classification layer is used for outputting an industry classification result of the interest point name.
Optionally, the interest point name association feature recognition sub-model further comprises an interest point name word segmentation analysis sub-model; the interest point name word segmentation analysis sub-model comprises the following steps: the semantic representation layer and the conditional random field layer;
The semantic representation layer is used for acquiring semantic representation features corresponding to the input interest point names, and inputting the semantic representation features to the conditional random field layer for identifying word attributes;
the conditional random field layer is used for outputting the word attribute of each word in the interest point name.
Optionally, training the multi-task learning model by using the training sample, and when the training ending condition is met, obtaining the trained multi-task learning model, including:
Acquiring a first training sample set matched with the interest roll name similarity recognition sub-model and a second training sample set respectively corresponding to each interest roll name associated feature recognition sub-model;
Respectively acquiring a set number of training samples from the first training sample set and each second training sample set, and sequentially adding the training samples into a newly-built multi-task training sample set;
After training the multi-task learning model by using the multi-task training sample set, judging whether the multi-task learning model meets a training ending condition;
and if not, returning to execute the operation of acquiring the set number of training samples from the first training sample set and each second training sample set respectively until the training ending condition is met, and obtaining the trained multi-task learning model.
Optionally, determining whether the multitask learning model meets the training end condition includes:
Using a test sample set to carry out identification accuracy test on the interest point name similarity identification sub-model;
And if the recognition accuracy of the interest point name similarity recognition sub-model exceeds a set threshold, determining that the training ending condition is met.
One embodiment of the above application has the following advantages or benefits: and the verification is performed through the verification sample, so that the quasi-determination of similarity recognition of the trained multitask model can be ensured.
In a second aspect, an embodiment of the present application discloses a device for comparing names of interest points, including:
the interest point name acquisition module is used for acquiring a first interest point name and a second interest point name to be compared;
the similarity calculation module is used for calculating the similarity between the first interest point name and the second interest point name according to semantic representation features respectively corresponding to the first interest point name and the second interest point name;
Wherein the semantic representation features include: the interest point name semantic similarity features and at least one interest point name associated semantic feature;
The industry attribute features of the interest point names are semantic features of industry attributes of the industries to which the interest point names belong, and the word attribute features of the segmentation words in the interest point names are semantic features of word attributes to which the segmentation words in the interest point names belong.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the point of interest name comparison method of any embodiment of the present application.
In a fourth aspect, embodiments of the present application further provide a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the point of interest name comparison method according to any of the embodiments of the present application.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:
FIG. 1 is a flow chart of a method for comparing point of interest names according to a first embodiment of the present application;
FIG. 2 is a flow chart of a method for comparing point of interest names according to a second embodiment of the present application;
FIG. 3a is a flow chart of a training method of a multi-task learning model according to a third embodiment of the present application;
FIG. 3b is a schematic diagram of a structure of a multi-task learning model according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of a point of interest name comparing apparatus according to a fourth embodiment of the present application;
Fig. 5 is a block diagram of an electronic device for implementing the point-of-interest name comparison method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flow chart of a method for comparing interest point names according to an embodiment of the present application, where the embodiment is applicable to a case of processing massive map interest point information data, for example, judging whether different information data describe the same interest point. The method may be performed by a point of interest name comparison means implemented in software and/or hardware, preferably arranged in an electronic device, such as a server or computer device. As shown in fig. 1, the method specifically includes the following steps:
S101, acquiring a first interest point name and a second interest point name to be compared.
The points of interest (Point of interest, POIs) are representations of geographic information, which may be a building, a business, or a bus stop, and the attribute information of each POI generally includes a name, an address, and coordinates. Therefore, when massive map interest point information data are processed, the first interest point name and the second interest point name to be compared can be obtained based on the attribute information of the interest points to be compared.
S102, calculating the similarity between the first interest point name and the second interest point name according to semantic representation features respectively corresponding to the first interest point name and the second interest point name.
Alternatively, the respective corresponding semantic representation features may be extracted from the first and second point-of-interest names, respectively, by natural language processing techniques or based on a pre-trained model (e.g., a multitasking model), wherein the semantic representation features are illustratively represented in the form of feature vectors or tensors, which are digitized representations of the point-of-interest names. In calculating the similarity between the first point of interest name and the second point of interest name, the similarity between the first point of interest name and the second point of interest name may be determined by illustratively cosine distances or euclidean distances between the semantic representation features of the first point of interest name and the semantic representation features of the second point of interest name.
In an embodiment of the present application, the semantic representation features include: the interest point name semantic similarity feature, and at least one interest point name associated semantic feature. The interest point name semantic similarity features are differentiated semantic features between the interest point names and other interest point names, and the interest point name associated semantic features are semantic features of other attributes associated with the interest point names.
The semantic similarity feature of the interest point name is optionally a feature representation corresponding to the name, for example, may be a feature vector representation of the name; in order to make the calculation result accurate when calculating the name similarity between two interest point names, the difference between the semantic similarity characteristics of the interest point names of different interest point names needs to be enlarged as much as possible, and correspondingly, the more accurate the calculation result is when the similarity distance is finally calculated. Therefore, the semantic similarity characteristics of the interest point names obtained by calculation are differentiated semantic characteristics between the interest point names and other interest point names.
The method for obtaining the differentiated semantic features may be that an interest point name similarity recognition model is obtained in advance through a training sample, where the training sample includes: and after the similarity recognition model of the interest point names is successfully trained, semantic representation layers in the similarity recognition model of the interest point names can be used to obtain semantic similarity characteristics of the interest point names corresponding to the interest point names.
Through the arrangement, when the similarity between the two interest point names is calculated, the interest point names with close names and different industry attributes can be distinguished, so that the calculation accuracy is improved.
In this embodiment, in order to make the calculation result more accurate when calculating the name similarity between two interest point names, besides considering using the interest point name semantic similarity feature, the interest point name associated semantic feature is further used, where the interest point name associated semantic feature is a semantic feature of other attributes associated with the interest point name, so as to further distinguish the difference values of the semantic features of different interest point names.
Optionally, the point-of-interest name association semantic feature may include: industry attribute features of the point of interest names, and/or word attribute features of the individual tokens in the point of interest names. For example, the roll-of-interest is named "Sha county snack", the industry attribute is "food", and the roll-of-interest is named "Beijing university", the terms including the terms "Beijing" and "university" are named "geographical location", and the term "university" is named "school".
The method for obtaining the one or more semantic features related to the interest point names may be that an interest point name related feature recognition model is obtained through training a training sample in advance, for example, an interest point name industry classification model is obtained through training the interest point name with the interest point industry classification result marked in advance as a training sample, and after the interest point name industry classification model is successfully obtained through training, a semantic representation layer in the interest point name industry classification model may be used to obtain the industry attribute feature of the interest point name corresponding to the interest point name.
In an optional implementation manner of this embodiment, machine learning models corresponding to the semantic similarity features of the point names of interest and at least one semantic feature associated with the point names of interest may be trained in advance, semantic representation layers in the machine learning models are obtained through training, the semantic similarity features of the point names of interest corresponding to the first point of interest and the second point of interest are obtained through calculation, and the semantic representation features corresponding to the first point of interest and the second point of interest are obtained through combination of the semantic similarity features of the point names of interest and the semantic features associated with the at least one point name of interest.
In another optional implementation manner of this embodiment, a multi-task learning model may be trained as a whole, different learning branches of the multi-task learning model respectively correspond to different machine learning models, and the machine learning models share the same semantic representation layer, after the multi-task learning model is trained and generated, semantic representation features obtained through the semantic representation layer may learn the similarity features of the interest point names and at least one associated semantic feature of the interest point names at the same time.
Correspondingly, if the semantic similarity feature of the interest point name in the semantic representation features of the first interest point name and the second interest point name and the associated semantic feature of at least one interest point name are independently separable, optionally, calculating a similarity result between the semantic similarity feature of the first interest point name and the semantic similarity feature of the second interest point name and a similarity result between the associated semantic feature of the first interest point name and the associated semantic feature of the second interest point name when calculating the similarity between the first interest point name and the second interest point name, and weighting and summing the obtained similarity results to obtain the final similarity between the first interest point name and the second interest point name. Or if the semantic representation features of the first interest point name and the semantic representation features of the second interest point name are simultaneously fused with the semantic similarity features of the interest point names and at least one semantic feature is associated with the interest point names, the similarity result between the semantic representation features of the first interest point name and the semantic representation features of the second interest point name can be directly calculated and used as the similarity of the first interest point name and the second interest point name.
According to the embodiment of the application, the similarity of the two interest point names is calculated according to the semantic similarity characteristics of the interest point names and the associated semantic characteristics of at least one interest point name, and compared with the similarity of the two interest point names calculated only according to the characteristic representation of the interest point names, the accuracy of calculating the similarity of the interest point names is improved.
Fig. 2 is a schematic flow chart of a method for comparing interest point names according to an embodiment of the present application, where the method is optimized based on the above embodiment, and referring to fig. 2, the method specifically includes:
S201, acquiring a first interest point name and a second interest point name to be compared.
S202, the first interest point name and the second interest point name are respectively input into a pre-trained multi-task learning model.
In an embodiment of the present application, a pre-trained multi-task learning model includes: the interest point name similarity recognition sub-model and at least one interest point name association feature recognition sub-model are arranged in parallel. The interest point name similarity recognition sub-model is used for outputting the similarity between the first interest point name and the second interest point name, and the interest point name similarity recognition sub-model and each interest point name association characteristic recognition sub-model share the same semantic representation layer.
It should be noted that, at least one point-of-interest name-associated feature recognition sub-model is only suitable for the training process of the multi-task learning model, and when the trained multi-task learning model is used to calculate the similarity of two point-of-interest names, the sub-model is substantially identified by using the similarity of the point-of-interest names in the multi-task model. In addition, the similarity recognition sub-model of the interest point names and the correlation feature recognition sub-model of the interest point names share the same semantic representation layer, so that after training the multi-task learning model by using training samples respectively matched with the similarity recognition sub-model of the interest point names and the correlation feature recognition sub-model of the interest point names, the shared semantic representation layer can comprehensively and accurately extract semantic representation features corresponding to the names of the interest points, for example, the semantic similarity features of the names of the interest points and the correlation semantic features of at least one interest point name are simultaneously obtained, and further, the accuracy of the similarity of two interest points is ensured to be recognized later.
Therefore, after a trained multi-task learning model is obtained, the similarity of the two interest point names can be determined according to the output of the multi-task model only by respectively inputting the first interest point name and the second interest point name to be compared into the multi-task learning model.
S203, obtaining the similarity between the first interest point name and the second interest point name output by the multi-task learning model.
After the first interest point name and the second interest point name are respectively input into a pre-trained multi-task learning model, a semantic representation layer of the multi-task learning model acquires semantic representation features respectively corresponding to the first interest point name and the second interest point name, and further, the similarity of the two interest point names is calculated based on the semantic representation features of the two interest point names.
The interest point name semantic similarity features are differentiated semantic features between the interest point names and other interest point names, and the interest point name associated semantic features are semantic features of other attributes associated with the interest point names.
In this embodiment, the semantic representation layers in the plurality of single-task deep neural networks are subjected to parameter sharing, and the plurality of single-task neural networks are combined into a multi-task learning model, so that the semantic representation layers in the parameter sharing can identify semantic expressions of deeper layers of interest points, and accuracy of similarity identification of the interest points can be ensured.
Fig. 3a is a flow chart of a training method of a multi-task learning model according to an embodiment of the present application, referring to fig. 3a, the method specifically includes:
S301, training samples which are respectively matched with the interest roll name similarity recognition sub-model and the interest roll name association feature recognition sub-model are obtained.
S302, training the multi-task learning model by using the training sample, and obtaining the trained multi-task learning model when the training ending condition is met.
Referring to fig. 3b, there is shown a schematic structural diagram of a multi-task learning model comprising: the method comprises the following steps of parallelly setting an interest point name similarity recognition sub-model, an interest point name industry classification sub-model and an interest point name word segmentation analysis sub-model, wherein the three sub-models share the same semantic representation layer. The interest roll name industry classification sub-model and the interest roll name segmentation analysis sub-model are interest roll name association characteristic identification sub-models.
The interest point name similarity recognition sub-model specifically comprises: the semantic representation layer and the cross-attention comparison layer; the training sample matched with the sub-model includes two interest point names, and a similarity result between the two interest points, such as "university of postal service, university of politics, dissimilarity". When training the interest point name similarity recognition submodel, the semantic representation layer is used for acquiring semantic representation features respectively corresponding to the two input interest point names, and inputting each semantic representation feature into the cross attention comparison layer for similarity comparison; the cross attention comparison layer is used for outputting the similarity between the two interest point names.
The interest point name industry classification sub-model comprises: the semantic representation layer and the full-connection classification layer; the training sample matched with the submodel comprises a point of interest name and industry classification matched with the point of interest name, such as ' Shaxian county snack ' - ' food, when the point of interest industry classification submodel is trained, the semantic representation layer is used for acquiring semantic representation features corresponding to the input point of interest name and inputting the semantic representation features into the fully-connected classification layer for industry classification; and the full-connection classification layer is used for outputting an industry classification result of the interest point name.
The interest point name word segmentation analysis sub-model comprises the following steps: the semantic representation layer and a Conditional Random Field (CRF) layer; the training sample matched with the submodel comprises word segmentation results of the interest point names and word attributes corresponding to the words, and when the interest point name word segmentation analysis submodel is trained, the semantic representation layer is used for acquiring semantic representation features corresponding to the input interest point names and inputting the semantic representation features to the conditional random field layer for identifying the word attributes; the conditional random field layer is used for outputting the word attribute of each word in the interest point name. Because the interest point name similarity recognition sub-model, the interest point name industry classification sub-model and the interest point name word segmentation analysis sub-model share the same semantic representation layer, after training the multi-task learning model by utilizing corresponding training samples, the shared semantic representation layer can comprehensively and accurately extract semantic representation features corresponding to the names of all the interest points.
When the multi-task model is specifically trained, a certain number of training samples matched with the interest point name similarity recognition sub-model are firstly obtained, the interest point name similarity recognition sub-model in the multi-task learning model is trained, a certain number of training samples matched with the interest point name industry classification sub-model are secondly obtained, the interest point name industry classification sub-model in the multi-task learning model is trained, a certain number of training samples corresponding to the interest point name segmentation sub-model are finally obtained, and the interest point name segmentation sub-model in the multi-task learning model is trained. The training process is used as a round, then the multi-task learning model is trained round by round according to the training process, and the trained multi-task learning model is obtained when the training ending condition is met.
In an alternative embodiment, the training sample is used to train the multi-task learning model, and when the training ending condition is met, the operation of obtaining the trained multi-task learning model includes:
S1, acquiring a first training sample set matched with the interest roll name similarity recognition sub-model and a second training sample set respectively corresponding to each interest roll name associated feature recognition sub-model.
S2, acquiring a set number of training samples from the first training sample set and each second training sample set respectively, and sequentially adding the training samples into the newly-built multi-task training sample set.
S3, training the multi-task learning model by using the multi-task training sample set, and judging whether the multi-task learning model meets the training ending condition.
S4, if not, returning to execute the operation of acquiring the set number of training samples from the first training sample set and each second training sample set respectively until the training ending condition is met, and obtaining the trained multi-task learning model.
Wherein determining whether the multitask learning model satisfies a training end condition comprises: using a test sample set to carry out identification accuracy test on the interest point name similarity identification sub-model; and if the recognition accuracy of the interest point name similarity recognition sub-model exceeds a set threshold, determining that the training ending condition is met.
In the embodiment of the application, the semantic representation layer of the obtained multi-task learning model can extract the semantic representation features of the interest point names more accurately and comprehensively by constructing the multi-task model sharing the semantic representation layer and training by utilizing the sample data, so that the accuracy of calculating the two interest point names is ensured.
Fig. 4 is a schematic structural diagram of an interest point name comparison device according to an embodiment of the present application, where the embodiment is applicable to a case of processing massive map interest point information data. The device can realize the interest point name comparison method according to any embodiment of the application. The apparatus 400 specifically includes the following:
The interest point name acquisition module 401 is configured to acquire a first interest point name and a second interest point name to be compared;
A similarity calculation module 402, configured to calculate a similarity between the first interest point name and the second interest point name according to semantic representation features corresponding to the first interest point name and the second interest point name, respectively;
Wherein the semantic representation features include: the interest point name semantic similarity features and at least one interest point name associated semantic feature;
The interest point name semantic similarity features are differentiated semantic features between the interest point names and other interest point names, and the interest point name associated semantic features are semantic features of other attributes associated with the interest point names.
Optionally, the semantic feature related to the interest point name includes: industry attribute characteristics of the interest point names and/or word attribute characteristics of each word in the interest point names;
The industry attribute features of the interest point names are semantic features of industry attributes of the industries to which the interest point names belong, and the word attribute features of the segmentation words in the interest point names are semantic features of word attributes to which the segmentation words in the interest point names belong.
Optionally, the similarity calculation module includes:
the input unit is used for respectively inputting the first interest point name and the second interest point name into a pre-trained multi-task learning model;
A similarity calculation unit, configured to obtain a similarity between the first interest point name and the second interest point name output by the multi-task learning model;
the multi-task learning model includes: the interest point name similarity recognition sub-model and at least one interest point name association feature recognition sub-model are arranged in parallel; the interest point name similarity recognition sub-model and each interest point name association characteristic recognition sub-model share the same semantic representation layer;
The semantic representation layer is used for acquiring the semantic representation features respectively corresponding to the first interest point name and the second interest point name; the interest point name similarity recognition sub-model is used for outputting similarity between the first interest point name and the second interest point name.
Optionally, the apparatus further includes:
the sample acquisition module is used for acquiring training samples which are respectively matched with the interest roll name similarity recognition sub-model and the interest roll name association characteristic recognition sub-models;
and the training module is used for training the multi-task learning model by using the training sample, and obtaining the trained multi-task learning model when the training ending condition is met.
Optionally, the interest point name similarity recognition sub-model specifically includes: the semantic representation layer and the cross-attention comparison layer;
The semantic representation layer is used for acquiring semantic representation features corresponding to the two input interest point names respectively, and inputting the semantic representation features to the cross attention comparison layer for similarity comparison;
The cross attention comparison layer is used for outputting the similarity between the two interest point names.
Optionally, the interest point name associated feature recognition sub-model includes an interest point name industry classification sub-model; the interest point name associated feature recognition sub-model comprises: the semantic representation layer and the full-connection classification layer;
the semantic representation layer is used for acquiring semantic representation features corresponding to the input interest point names, and inputting the semantic representation features into the fully-connected classification layer for industry classification;
and the full-connection classification layer is used for outputting an industry classification result of the interest point name.
Optionally, the interest point name association feature recognition sub-model further comprises an interest point name word segmentation analysis sub-model; the interest point name word segmentation analysis sub-model comprises the following steps: the semantic representation layer and the conditional random field layer;
The semantic representation layer is used for acquiring semantic representation features corresponding to the input interest point names, and inputting the semantic representation features to the conditional random field layer for identifying word attributes;
the conditional random field layer is used for outputting the word attribute of each word in the interest point name.
Optionally, the training module includes:
the acquisition unit is used for acquiring a first training sample set matched with the interest roll name similarity recognition sub-model and a second training sample set respectively corresponding to each interest roll name associated feature recognition sub-model;
The selecting unit is used for respectively acquiring a set number of training samples from the first training sample set and each second training sample set, and sequentially adding the training samples into the newly-built multi-task training sample set;
The training unit is used for judging whether the multi-task learning model meets the training ending condition after training the multi-task learning model by using the multi-task training sample set;
and the judging unit is used for returning to execute the operation of respectively acquiring the set number of training samples from the first training sample set and each second training sample set if the training ending condition is judged not to be met until the trained multi-task learning model is obtained when the training ending condition is met.
Optionally, determining whether the multitask learning model meets the training end condition includes:
Using a test sample set to carry out identification accuracy test on the interest point name similarity identification sub-model;
And if the recognition accuracy of the interest point name similarity recognition sub-model exceeds a set threshold, determining that the training ending condition is met.
The interest point name comparison device provided by the embodiment of the application can execute the interest point name comparison method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Reference is made to the description of any method embodiment of the application for details not described in this embodiment.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the point of interest name comparison method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the point-of-interest name comparison method provided by the present application.
The memory 502 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the point-of-interest name acquisition module 401 and the similarity calculation module 40 shown in fig. 4) corresponding to the point-of-interest name comparison method in the embodiment of the application. The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, i.e., implements the point-of-interest name comparison method in the method embodiments described above.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device implementing the point-of-interest name comparison method, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 502 may optionally include memory remotely located with respect to the processor 501, which may be connected via a network to an electronic device implementing the point of interest name comparison method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for implementing the method for comparing the names of the interest points can further comprise: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device implementing the point of interest name comparison method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme provided by the embodiment of the application, the similarity of the two interest point names is calculated according to the interest point name semantic similarity feature and at least one interest point name association semantic feature, and compared with the calculation of the similarity of the two interest point names according to the feature representation of the interest point names, the accuracy of calculating the similarity of the interest point names is improved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.
The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims (16)

1.一种兴趣点名称比对方法,其特征在于,包括:1. A method for comparing names of points of interest, comprising: 获取待比对的第一兴趣点名称和第二兴趣点名称;Obtain the name of the first point of interest and the name of the second point of interest to be compared; 根据与第一兴趣点名称和第二兴趣点名称分别对应的语义表示特征,计算第一兴趣点名称与第二兴趣点名称之间的相似度;Calculating the similarity between the first point of interest name and the second point of interest name according to the semantic representation features corresponding to the first point of interest name and the second point of interest name respectively; 其中,所述语义表示特征中包括:兴趣点名称语义相似度特征,以及至少一个兴趣点名称关联语义特征;The semantic representation features include: a semantic similarity feature of a point of interest name, and at least one semantic feature associated with a point of interest name; 所述兴趣点名称语义相似度特征为兴趣点名称与其他兴趣点名称间的差异化语义特征,所述兴趣点名称关联语义特征为与兴趣点名称关联的其他属性的语义特征;The semantic similarity feature of the POI name is a differentiated semantic feature between the POI name and other POI names, and the associated semantic feature of the POI name is a semantic feature of other attributes associated with the POI name; 其中,所述兴趣点名称关联语义特征包括:兴趣点名称的行业属性特征和兴趣点名称中各分词的词语属性特征;The semantic features associated with the POI name include: industry attribute features of the POI name and word attribute features of each word in the POI name; 所述兴趣点名称的行业属性特征为兴趣点名称所属行业的行业属性的语义特征,所述兴趣点名称中各分词的词语属性特征为兴趣点名称中各分词所属的词语属性的语义特征;The industry attribute feature of the POI name is the semantic feature of the industry attribute of the industry to which the POI name belongs, and the word attribute feature of each participle in the POI name is the semantic feature of the word attribute to which each participle in the POI name belongs; 其中,根据与第一兴趣点名称和第二兴趣点名称分别对应的语义表示特征,计算第一兴趣点名称与第二兴趣点名称之间的相似度,包括:The method of calculating the similarity between the first point of interest name and the second point of interest name according to the semantic representation features corresponding to the first point of interest name and the second point of interest name respectively includes: 将第一兴趣点名称和第二兴趣点名称分别输入至预先训练的多任务学习模型中;Inputting the first point of interest name and the second point of interest name into a pre-trained multi-task learning model respectively; 获取所述多任务学习模型输出的所述第一兴趣点名称与第二兴趣点名称之间的相似度;Obtaining a similarity between the first point of interest name and the second point of interest name output by the multi-task learning model; 所述多任务学习模型包括:并行设置的兴趣点名称相似度识别子模型,以及至少一个兴趣点名称关联特征识别子模型;兴趣点名称相似度识别子模型与各兴趣点名称关联特征识别子模型共用同一语义表示层;所述兴趣点名称相似度识别子模型具体包括:所述语义表示层以及交叉注意力比较层;所述兴趣点名称关联特征识别子模型包括兴趣点名称行业分类子模型;所述兴趣点名称行业分类子模型包括:所述语义表示层以及全连接分类层;所述兴趣点名称关联特征识别子模型还包括兴趣点名称分词解析子模型;所述兴趣点名称分词解析子模型包括:所述语义表示层以及条件随机场层;The multi-task learning model includes: a point of interest name similarity recognition sub-model set in parallel, and at least one point of interest name associated feature recognition sub-model; the point of interest name similarity recognition sub-model and each point of interest name associated feature recognition sub-model share the same semantic representation layer; the point of interest name similarity recognition sub-model specifically includes: the semantic representation layer and the cross-attention comparison layer; the point of interest name associated feature recognition sub-model includes a point of interest name industry classification sub-model; the point of interest name industry classification sub-model includes: the semantic representation layer and the fully connected classification layer; the point of interest name associated feature recognition sub-model also includes a point of interest name word segmentation and parsing sub-model; the point of interest name word segmentation and parsing sub-model includes: the semantic representation layer and the conditional random field layer; 所述语义表示层,用于获取与第一兴趣点名称和第二兴趣点名称分别对应的所述语义表示特征;所述兴趣点名称相似度识别子模型,用于输出第一兴趣点名称和第二兴趣点名称之间的相似度。The semantic representation layer is used to obtain the semantic representation features corresponding to the first point of interest name and the second point of interest name respectively; the point of interest name similarity recognition sub-model is used to output the similarity between the first point of interest name and the second point of interest name. 2.根据权利要求1所述的方法,其特征在于,在获取待比对的第一兴趣点名称和第二兴趣点名称之前,还包括:2. The method according to claim 1, characterized in that before obtaining the first POI name and the second POI name to be compared, it also includes: 获取与兴趣点名称相似度识别子模型以及各兴趣点名称关联特征识别子模型分别匹配的训练样本;Obtain training samples that match the POI name similarity recognition sub-model and the POI name associated feature recognition sub-model respectively; 使用所述训练样本对所述多任务学习模型进行训练,并在满足训练结束条件时,得到训练后的所述多任务学习模型。The multi-task learning model is trained using the training samples, and when a training end condition is met, the trained multi-task learning model is obtained. 3.根据权利要求2所述的方法,其特征在于,3. The method according to claim 2, characterized in that 所述语义表示层用于获取与输入的两个兴趣点名称分别对应的语义表示特征,并将各所述语义表示特征输入至所述交叉注意力比较层进行相似度比对;The semantic representation layer is used to obtain semantic representation features corresponding to the two inputted POI names, and input each of the semantic representation features into the cross-attention comparison layer for similarity comparison; 所述交叉注意力比较层,用于输出两个兴趣点名称间的相似度。The cross attention comparison layer is used to output the similarity between the names of two points of interest. 4.根据权利要求2所述的方法,其特征在于,4. The method according to claim 2, characterized in that: 所述语义表示层,用于获取与输入的兴趣点名称对应的语义表示特征,并将所述语义表示特征输入至所述全连接分类层进行行业分类;The semantic representation layer is used to obtain semantic representation features corresponding to the inputted POI names, and input the semantic representation features into the fully connected classification layer for industry classification; 所述全连接分类层,用于输出所述兴趣点名称的行业分类结果。The fully connected classification layer is used to output the industry classification result of the point of interest name. 5.根据权利要求2所述的方法,其特征在于,5. The method according to claim 2, characterized in that: 所述语义表示层,用于获取与输入的兴趣点名称对应的语义表示特征,并将所述语义表示特征输入至所述条件随机场层进行词语属性的识别;The semantic representation layer is used to obtain semantic representation features corresponding to the inputted POI name, and input the semantic representation features into the conditional random field layer for word attribute recognition; 所述条件随机场层,用于输出所述兴趣点名称中各分词的词语属性。The conditional random field layer is used to output the word attributes of each word segment in the name of the point of interest. 6.根据权利要求2所述的方法,其特征在于,使用所述训练样本对所述多任务学习模型进行训练,并在满足训练结束条件时,得到训练后的所述多任务学习模型,包括:6. The method according to claim 2, characterized in that the multi-task learning model is trained using the training samples, and when the training end condition is met, the trained multi-task learning model is obtained, comprising: 获取与兴趣点名称相似度识别子模型匹配的第一训练样本集合,以及与各所述兴趣点名称关联特征识别子模型分别对应的第二训练样本集合;Acquire a first training sample set matching the interest point name similarity recognition sub-model, and a second training sample set corresponding to each of the interest point name associated feature recognition sub-models; 分别从所述第一训练样本集合和各所述第二训练样本集合中获取设定数量的训练样本,并顺序加入至新建的多任务训练样本集合中;Respectively obtaining a set number of training samples from the first training sample set and each of the second training sample sets, and sequentially adding them to the newly created multi-task training sample set; 使用所述多任务训练样本集合对所述多任务学习模型进行训练后,判断所述多任务学习模型是否满足训练结束条件;After training the multi-task learning model using the multi-task training sample set, determining whether the multi-task learning model meets a training end condition; 若否,则返回执行分别从所述第一训练样本集合和各所述第二训练样本集合中获取设定数量的训练样本的操作,直至在满足训练结束条件时,得到训练后的所述多任务学习模型。If not, return to execute the operation of obtaining a set number of training samples from the first training sample set and each of the second training sample sets respectively, until the trained multi-task learning model is obtained when the training end condition is met. 7.根据权利要求6所述的方法,其特征在于,判断所述多任务学习模型是否满足训练结束条件包括:7. The method according to claim 6, characterized in that determining whether the multi-task learning model meets the training end condition comprises: 使用测试样本集对所述兴趣点名称相似度识别子模型进行识别准确度测试;Using a test sample set to perform a recognition accuracy test on the interest point name similarity recognition sub-model; 如果所述兴趣点名称相似度识别子模型的识别准确性超过设定阈值,则确定满足所述训练结束条件。If the recognition accuracy of the interest point name similarity recognition sub-model exceeds a set threshold, it is determined that the training end condition is met. 8.一种兴趣点名称比对装置,其特征在于,包括:8. A device for comparing names of points of interest, comprising: 兴趣点名称获取模块,用于获取待比对的第一兴趣点名称和第二兴趣点名称;A point of interest name acquisition module, used to acquire the first point of interest name and the second point of interest name to be compared; 相似度计算模块,用于根据与第一兴趣点名称和第二兴趣点名称分别对应的语义表示特征,计算第一兴趣点名称与第二兴趣点名称之间的相似度;a similarity calculation module, configured to calculate the similarity between the first point of interest name and the second point of interest name according to the semantic representation features corresponding to the first point of interest name and the second point of interest name respectively; 其中,所述语义表示特征中包括:兴趣点名称语义相似度特征,以及至少一个兴趣点名称关联语义特征;The semantic representation features include: a semantic similarity feature of a point of interest name, and at least one semantic feature associated with a point of interest name; 所述兴趣点名称语义相似度特征为兴趣点名称与其他兴趣点名称间的差异化语义特征,所述兴趣点名称关联语义特征为与兴趣点名称相关联的其他属性的语义特征;The semantic similarity feature of the POI name is a differentiated semantic feature between the POI name and other POI names, and the associated semantic feature of the POI name is a semantic feature of other attributes associated with the POI name; 其中,所述兴趣点名称关联语义特征包括:兴趣点名称的行业属性特征和兴趣点名称中各分词的词语属性特征;The semantic features associated with the POI name include: industry attribute features of the POI name and word attribute features of each word in the POI name; 所述兴趣点名称的行业属性特征为兴趣点名称所属行业的行业属性的语义特征,所述兴趣点名称中各分词的词语属性特征为兴趣点名称中各分词所属的词语属性的语义特征;The industry attribute feature of the POI name is the semantic feature of the industry attribute of the industry to which the POI name belongs, and the word attribute feature of each participle in the POI name is the semantic feature of the word attribute to which each participle in the POI name belongs; 其中,所述相似度计算模块包括:Wherein, the similarity calculation module includes: 输入单元,用于将第一兴趣点名称和第二兴趣点名称分别输入至预先训练的多任务学习模型中;An input unit, used to input the first point of interest name and the second point of interest name into a pre-trained multi-task learning model respectively; 相似度计算单元,获取所述多任务学习模型输出的所述第一兴趣点名称与第二兴趣点名称之间的相似度;a similarity calculation unit, for obtaining a similarity between the first POI name and the second POI name output by the multi-task learning model; 所述多任务学习模型包括:并行设置的兴趣点名称相似度识别子模型,以及至少一个兴趣点名称关联特征识别子模型;兴趣点名称相似度识别子模型与各兴趣点名称关联特征识别子模型共用同一语义表示层;所述兴趣点名称相似度识别子模型具体包括:所述语义表示层以及交叉注意力比较层;所述兴趣点名称关联特征识别子模型包括兴趣点名称行业分类子模型;所述兴趣点名称行业分类子模型包括:所述语义表示层以及全连接分类层;所述兴趣点名称关联特征识别子模型还包括兴趣点名称分词解析子模型;所述兴趣点名称分词解析子模型包括:所述语义表示层以及条件随机场层;The multi-task learning model includes: a point of interest name similarity recognition sub-model set in parallel, and at least one point of interest name associated feature recognition sub-model; the point of interest name similarity recognition sub-model and each point of interest name associated feature recognition sub-model share the same semantic representation layer; the point of interest name similarity recognition sub-model specifically includes: the semantic representation layer and the cross-attention comparison layer; the point of interest name associated feature recognition sub-model includes a point of interest name industry classification sub-model; the point of interest name industry classification sub-model includes: the semantic representation layer and the fully connected classification layer; the point of interest name associated feature recognition sub-model also includes a point of interest name word segmentation and parsing sub-model; the point of interest name word segmentation and parsing sub-model includes: the semantic representation layer and the conditional random field layer; 所述语义表示层,用于获取与第一兴趣点名称和第二兴趣点名称分别对应的所述语义表示特征;所述兴趣点名称相似度识别子模型,用于输出第一兴趣点名称和第二兴趣点名称之间的相似度。The semantic representation layer is used to obtain the semantic representation features corresponding to the first point of interest name and the second point of interest name respectively; the point of interest name similarity recognition sub-model is used to output the similarity between the first point of interest name and the second point of interest name. 9.根据权利要求8所述的装置,其特征在于,所述装置还包括:9. The device according to claim 8, characterized in that the device further comprises: 样本获取模块,用于获取与兴趣点名称相似度识别子模型以及各兴趣点名称关联特征识别子模型分别匹配的训练样本;A sample acquisition module is used to acquire training samples that match the interest point name similarity recognition sub-model and the interest point name associated feature recognition sub-model respectively; 训练模块,用于使用所述训练样本对所述多任务学习模型进行训练,并在满足训练结束条件时,得到训练后的所述多任务学习模型。The training module is used to train the multi-task learning model using the training samples, and obtain the trained multi-task learning model when the training end condition is met. 10.根据权利要求9所述的装置,其特征在于,10. The device according to claim 9, characterized in that 所述语义表示层用于获取与输入的两个兴趣点名称分别对应的语义表示特征,并将各所述语义表示特征输入至所述交叉注意力比较层进行相似度比对;The semantic representation layer is used to obtain semantic representation features corresponding to the two inputted POI names, and input each of the semantic representation features into the cross attention comparison layer for similarity comparison; 所述交叉注意力比较层,用于输出两个兴趣点名称间的相似度。The cross attention comparison layer is used to output the similarity between the names of two points of interest. 11.根据权利要求9所述的装置,其特征在于,11. The device according to claim 9, characterized in that 所述语义表示层,用于获取与输入的兴趣点名称对应的语义表示特征,并将所述语义表示特征输入至所述全连接分类层进行行业分类;The semantic representation layer is used to obtain semantic representation features corresponding to the inputted POI names, and input the semantic representation features into the fully connected classification layer for industry classification; 所述全连接分类层,用于输出所述兴趣点名称的行业分类结果。The fully connected classification layer is used to output the industry classification result of the point of interest name. 12.根据权利要求9所述的装置,其特征在于,12. The device according to claim 9, characterized in that 所述语义表示层,用于获取与输入的兴趣点名称对应的语义表示特征,并将所述语义表示特征输入至所述条件随机场层进行词语属性的识别;The semantic representation layer is used to obtain semantic representation features corresponding to the inputted POI name, and input the semantic representation features into the conditional random field layer for word attribute recognition; 所述条件随机场层,用于输出所述兴趣点名称中各分词的词语属性。The conditional random field layer is used to output the word attributes of each word segment in the name of the point of interest. 13.根据权利要求9所述的装置,其特征在于,所述训练模块包括:13. The device according to claim 9, wherein the training module comprises: 获取单元,用于获取与兴趣点名称相似度识别子模型匹配的第一训练样本集合,以及与各所述兴趣点名称关联特征识别子模型分别对应的第二训练样本集合;An acquisition unit, used to acquire a first training sample set matching the interest point name similarity recognition sub-model, and a second training sample set corresponding to each of the interest point name associated feature recognition sub-models; 选择单元,用于分别从所述第一训练样本集合和各所述第二训练样本集合中获取设定数量的训练样本,并顺序加入至新建的多任务训练样本集合中;A selection unit, used to obtain a set number of training samples from the first training sample set and each of the second training sample sets, and sequentially add them to the newly created multi-task training sample set; 训练单元,用于使用所述多任务训练样本集合对所述多任务学习模型进行训练后,判断所述多任务学习模型是否满足训练结束条件;A training unit, configured to train the multi-task learning model using the multi-task training sample set, and then determine whether the multi-task learning model meets a training end condition; 判断单元,用于若判断不满足训练结束条件,则返回执行分别从所述第一训练样本集合和各所述第二训练样本集合中获取设定数量的训练样本的操作,直至在满足训练结束条件时,得到训练后的所述多任务学习模型。A judgment unit is used to return to execute the operation of obtaining a set number of training samples from the first training sample set and each of the second training sample sets respectively if it is judged that the training end condition is not met, until the trained multi-task learning model is obtained when the training end condition is met. 14.根据权利要求13所述的装置,其特征在于,14. The device according to claim 13, characterized in that 判断所述多任务学习模型是否满足训练结束条件包括:Determining whether the multi-task learning model meets the training end condition includes: 使用测试样本集对所述兴趣点名称相似度识别子模型进行识别准确度测试;Using a test sample set to perform a recognition accuracy test on the interest point name similarity recognition sub-model; 如果所述兴趣点名称相似度识别子模型的识别准确性超过设定阈值,则确定满足所述训练结束条件。If the recognition accuracy of the interest point name similarity recognition sub-model exceeds a set threshold, it is determined that the training end condition is met. 15.一种电子设备,其特征在于,包括:15. An electronic device, comprising: 至少一个处理器;以及at least one processor; and 与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein, 所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7中任一项所述的兴趣点名称比对方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the POI name comparison method according to any one of claims 1 to 7. 16.一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-7中任一项所述的兴趣点名称比对方法。16. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to enable the computer to execute the point of interest name comparison method according to any one of claims 1 to 7.
CN202010074812.6A 2020-01-22 2020-01-22 Method, device, electronic device and storage medium for comparing names of points of interest Active CN113157829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010074812.6A CN113157829B (en) 2020-01-22 2020-01-22 Method, device, electronic device and storage medium for comparing names of points of interest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010074812.6A CN113157829B (en) 2020-01-22 2020-01-22 Method, device, electronic device and storage medium for comparing names of points of interest

Publications (2)

Publication Number Publication Date
CN113157829A CN113157829A (en) 2021-07-23
CN113157829B true CN113157829B (en) 2024-11-26

Family

ID=76881668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010074812.6A Active CN113157829B (en) 2020-01-22 2020-01-22 Method, device, electronic device and storage medium for comparing names of points of interest

Country Status (1)

Country Link
CN (1) CN113157829B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3860163B1 (en) * 2020-01-30 2023-07-12 HERE Global B.V. Matching location-related information with name information of points of interest
CN114065952B (en) * 2021-11-29 2024-11-26 腾讯科技(深圳)有限公司 Model training method and device
CN114595303A (en) * 2022-03-07 2022-06-07 阿里巴巴(中国)有限公司 Data processing system, method and electronic device
CN114625984B (en) * 2022-03-31 2023-07-25 北京百度网讯科技有限公司 Point-of-interest verification method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489507A (en) * 2019-08-16 2019-11-22 腾讯科技(深圳)有限公司 Determine the method, apparatus, computer equipment and storage medium of point of interest similarity
CN110674260A (en) * 2019-09-27 2020-01-10 北京百度网讯科技有限公司 Training method and device of semantic similarity model, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8204886B2 (en) * 2009-11-06 2012-06-19 Nokia Corporation Method and apparatus for preparation of indexing structures for determining similar points-of-interests
US20110313954A1 (en) * 2010-06-18 2011-12-22 Microsoft Corporation Community model based point of interest local search
US10089576B2 (en) * 2015-07-28 2018-10-02 Microsoft Technology Licensing, Llc Representation learning using multi-task deep neural networks
US10963504B2 (en) * 2016-02-12 2021-03-30 Sri International Zero-shot event detection using semantic embedding
CN110516094A (en) * 2019-08-29 2019-11-29 百度在线网络技术(北京)有限公司 De-weight method, device, electronic equipment and the storage medium of class interest point data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489507A (en) * 2019-08-16 2019-11-22 腾讯科技(深圳)有限公司 Determine the method, apparatus, computer equipment and storage medium of point of interest similarity
CN110674260A (en) * 2019-09-27 2020-01-10 北京百度网讯科技有限公司 Training method and device of semantic similarity model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113157829A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN111709247B (en) Data set processing method and device, electronic equipment and storage medium
CN113157829B (en) Method, device, electronic device and storage medium for comparing names of points of interest
CN111259671B (en) Semantic description processing method, device and equipment for text entity
CN111767359B (en) Point-of-interest classification method, device, equipment and storage medium
CN111626119A (en) Target recognition model training method, device, equipment and storage medium
CN111369602A (en) Method, device, electronic device and readable storage medium for processing point cloud data
CN111177355A (en) Human-computer dialogue interaction method, device and electronic device based on search data
CN112241715A (en) Model training method, expression recognition method, device, equipment and storage medium
US20220044559A1 (en) Method and apparatus for outputing vehicle flow direction, roadside device, and cloud control platform
CN113780098B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN111539347B (en) Method and device for detecting target
CN114444462B (en) Model training method and man-machine interaction method and device
CN111241838B (en) Semantic relation processing method, device and equipment for text entity
CN111667056A (en) Method and apparatus for searching model structure
CN111863036B (en) Voice detection method and device
CN114881129A (en) Model training method and device, electronic equipment and storage medium
CN111611808B (en) Method and apparatus for generating natural language model
CN112507090A (en) Method, apparatus, device and storage medium for outputting information
CN113158030B (en) Recommendation method and device for remote interest points, electronic equipment and storage medium
CN113269213A (en) Training set acquisition method and device and electronic equipment
CN109034199B (en) Data processing method and device, storage medium and electronic equipment
CN111860580A (en) Recognition model obtaining and category recognition method, device and storage medium
CN111782979B (en) Brand classification method, device and equipment for interest points and storage medium
CN112559715B (en) Attitude identification method, device, equipment and storage medium
CN112328896B (en) Method, apparatus, electronic device, and medium for outputting information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant