[go: up one dir, main page]

CN119830906B - Method and device for determining sensitivity of geographic information vocabulary - Google Patents

Method and device for determining sensitivity of geographic information vocabulary

Info

Publication number
CN119830906B
CN119830906B CN202411917340.0A CN202411917340A CN119830906B CN 119830906 B CN119830906 B CN 119830906B CN 202411917340 A CN202411917340 A CN 202411917340A CN 119830906 B CN119830906 B CN 119830906B
Authority
CN
China
Prior art keywords
geographic information
vocabulary
word
database
phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411917340.0A
Other languages
Chinese (zh)
Other versions
CN119830906A (en
Inventor
陈会仙
章炜
闫春利
杨殿阁
杨蒙蒙
程晓茜
王心宇
杨廷超
李墨
朱大伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Navinfo Co Ltd
Original Assignee
Tsinghua University
Navinfo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Navinfo Co Ltd filed Critical Tsinghua University
Priority to CN202411917340.0A priority Critical patent/CN119830906B/en
Publication of CN119830906A publication Critical patent/CN119830906A/en
Application granted granted Critical
Publication of CN119830906B publication Critical patent/CN119830906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种地理信息词汇的敏感性确定方法和装置,所述方法包括:获取目标地理信息词汇;确定目标地理信息词汇是否命中预先建立的地理信息词汇数据库,若未命中,获取预先建立的地理信息词组数据库,地理信息词组数据库中包括若干地理信息词组及其标签信息,多个地理信息词组能够组成地理信息词汇,地理信息词组的标签信息用于表示包括该地理信息词组的地理信息词汇为敏感词汇或非敏感词汇的概率;对目标地理信息词汇进行分词,得到多个分词,将多个分词中的每个分词与地理信息词组数据库中的地理信息词组进行匹配,基于地理信息词组数据库中与分词匹配成功的各个地理信息词组的标签信息确定目标地理信息词汇的敏感性。

A method and device for determining the sensitivity of geographic information vocabulary, the method comprising: obtaining a target geographic information vocabulary; determining whether the target geographic information vocabulary matches a pre-established geographic information vocabulary database; if not, obtaining a pre-established geographic information phrase database, wherein the geographic information phrase database includes a plurality of geographic information phrases and their label information, wherein a plurality of geographic information phrases can constitute a geographic information vocabulary, and the label information of the geographic information phrase is used to indicate the probability that the geographic information vocabulary including the geographic information phrase is a sensitive word or a non-sensitive word; segmenting the target geographic information vocabulary to obtain a plurality of segmentations, matching each of the plurality of segmentations with a geographic information phrase in the geographic information phrase database, and determining the sensitivity of the target geographic information vocabulary based on the label information of each geographic information phrase in the geographic information phrase database that successfully matches the segmentation.

Description

Method and device for determining sensitivity of geographic information vocabulary
Technical Field
The application relates to the technical field of data security, in particular to a method and a device for determining sensitivity of geographic information words.
Background
In modern information society, geographic information has been widely used in various scenarios including navigation, location services, social networks, smart cities, and the like. However, some of the content contained in the geographic information may involve sensitive data such as personal privacy, national security, business confidentiality, etc., so how to make a sensitive judgment on the geographic information and further take corresponding protection measures becomes an important subject in research and technical application. In the related art, a database of sensitive geographic information is generally established, the geographic information to be distinguished is matched with the geographic information in the database, and whether the geographic information to be distinguished is sensitive or not is determined according to a matching result. However, since the geographic information has dynamic property and real-time property, that is, the geographic information is continuously updated along with time, place or environment changes, a certain difficulty exists in timely and comprehensively establishing a database of the sensitive geographic information, so that the judgment result of the sensitive geographic information is not accurate enough.
Disclosure of Invention
The embodiment of the application provides a sensitivity determination method of a geographic information word, which comprises the steps of obtaining a target geographic information word, determining whether the target geographic information word hits a pre-established geographic information word database, wherein the geographic information word database comprises a plurality of geographic information words, each geographic information word corresponds to an interest point, the interest point represents a geographic position with uniqueness and certainty, the plurality of geographic information words comprise sensitive words and non-sensitive words, obtaining a pre-established geographic information word group database if the target geographic information word does not hit the geographic information word database, the geographic information word group database comprises a plurality of geographic information word groups and label information thereof, the geographic information word groups can form the geographic information word group, the label information of the geographic information word groups is used for representing the probability that the geographic information word groups comprise the geographic information word groups are the sensitive words or the non-sensitive words, performing word segmentation on the target geographic information word groups, obtaining a plurality of word segments, and performing word group matching on each word segment in the plurality of word segments and the geographic information word groups in the geographic information word group database, and determining the geographic information word groups based on the label information word groups of the target word groups.
In a second aspect, an embodiment of the present application provides a sensitivity determining device for a geographic information word, where the device includes a first obtaining module configured to obtain a target geographic information word, a determining module configured to determine whether the target geographic information word hits a pre-established geographic information word database, where the geographic information word database includes a plurality of geographic information words, each geographic information word corresponds to a point of interest, the point of interest represents a geographic location with uniqueness and certainty, the plurality of geographic information words include a sensitive word and a non-sensitive word, and a second obtaining module configured to obtain a pre-established geographic information word group database if the target geographic information word does not hit the geographic information word database, where the geographic information word group includes a plurality of geographic information words and tag information thereof, where tag information of the geographic information word group is configured to represent a probability that the geographic information word including the geographic information word is a sensitive word or a non-sensitive word, and a matching module configured to segment the target geographic information word to obtain a plurality of segment words, and to match each segment word with the geographic information word group of the target word group based on the tag information word of the geographic information word database.
In a third aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present application.
In a fourth aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method according to any embodiment of the present application when the processor executes the computer program.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present application.
The embodiment of the application establishes two databases with data granularity, wherein the database with larger data granularity is a geographic information vocabulary database, and comprises a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to one interest point and can represent geographic positions with uniqueness and certainty, the database with smaller data granularity is a geographic information phrase database, and comprises a plurality of geographic information phrases and label information thereof, wherein the geographic information phrases are part of geographic information vocabularies, the geographic information phrases can form geographic information vocabularies, and the label information can represent the probability that the geographic information vocabularies comprising the geographic information phrases are sensitive vocabularies or non-sensitive vocabularies. When the sensitivity of the target geographic information word is determined, firstly determining whether the target geographic information word hits a geographic information word database, if not, matching a plurality of segmented words in the target geographic information word with geographic information word groups in a geographic information word group database, and determining the sensitivity of the target geographic information word according to the label information of each geographic information word group successfully matched with the segmented words in the geographic information word group database. Because the geographic information word group is only a part of geographic information word groups, the granularity of the geographic information word group is smaller than that of the geographic information word groups, the geographic information word group can be used as elements for forming the geographic information word groups, even if the complete geographic information word groups are not recorded in the geographic information word group database, the sensitivity of the geographic information word groups can be estimated based on the elements for forming the geographic information word groups, and the geographic information word groups which are not recorded in the geographic information word group database are not simply judged as non-sensitive words, so that the accuracy of the sensitivity judging result of the geographic information word groups is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow chart of a method for determining the sensitivity of a geographic information vocabulary in accordance with an embodiment of the present application.
FIG. 2 is a schematic diagram of a geographic information vocabulary database according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a geographic information phrase database according to an embodiment of the present application.
FIG. 4 is a schematic diagram of a system architecture according to an embodiment of the present application.
Fig. 5 is a schematic diagram of a generating manner of a comparison table of a weak positive word stock according to an embodiment of the present application.
Fig. 6 is a schematic diagram of the overall flow of an embodiment of the present application.
FIG. 7 is a block diagram of a sensitivity determination apparatus of a geographic information vocabulary of an embodiment of the present application.
FIG. 8 is a schematic diagram of a computer device according to an embodiment of the application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
In order to better understand the technical solution in the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solution in the embodiments of the present application is described in further detail below with reference to the accompanying drawings.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and be provided with corresponding operation entries for the user to select authorization or rejection.
In many application scenarios it may involve the acquisition of geographical information, which may include sensitive information. In order to improve data security, sensitivity judgment is required for geographic information. For example, in a smart driving scenario, on-board sensors (e.g., GPS, radar, cameras, etc.) may be utilized to collect geographic information of the surrounding environment for planning routes and avoiding obstacles. The collected geographic information may include names of secret-related institutions, and the names of the secret-related institutions are not suitable to be disclosed, so that the names of the secret-related institutions are filtered out of the collected data through sensitivity judgment and are subjected to desensitization treatment.
In the related art, a database of sensitive geographic information is generally established, the geographic information to be distinguished is matched with the geographic information in the database, and whether the geographic information to be distinguished is sensitive or not is determined according to a matching result. For example, if the database of the sensitive geographic information includes the "XX military base", the geographic information to be discriminated is successfully matched with the "XX military base" in the database of the sensitive geographic information when the geographic information to be discriminated includes the "XX military base", so that the geographic information to be discriminated can be judged as the sensitive geographic information.
However, the geographic information may be updated continuously with time, place or environment changes, for example, according to the development of national conditions, a country may add some new administrative institutions in a secret place. The update of the database of the sensitive geographic information has hysteresis relative to the update of the geographic information, and if the geographic information corresponding to the newly added administrative organization cannot be recorded in the database of the sensitive geographic information in time, the sensitivity of the actually acquired geographic information cannot be accurately judged in a database matching mode.
Based on this, the present application provides a method for determining sensitivity of geographic information vocabulary, referring to fig. 1, the method includes:
s11, acquiring a target geographic information vocabulary;
Step S12, determining whether the target geographic information vocabulary hits a pre-established geographic information vocabulary database, wherein the geographic information vocabulary database comprises a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to an interest point, the interest points represent geographic positions with uniqueness and certainty, and the geographic information vocabularies comprise sensitive vocabularies and non-sensitive vocabularies;
Step S13, if the target geographic information word does not hit the geographic information word database, a pre-established geographic information word group database is obtained, wherein the geographic information word group database comprises a plurality of geographic information word groups and tag information thereof, the geographic information word groups can form geographic information word groups, and the tag information of the geographic information word groups is used for representing the probability that the geographic information word groups comprising the geographic information word groups are sensitive words or non-sensitive words;
step S14, word segmentation is carried out on the target geographic information word, a plurality of word segments are obtained, each word segment in the plurality of word segments is matched with the geographic information word group in the geographic information word group database, and the sensitivity of the target geographic information word is determined based on the label information of each geographic information word group successfully matched with the word segment in the geographic information word group database.
Considering that sensitive geographic information words (i.e., sensitive words) often include designated elements, the present application builds a geographic information word database based on the complete geographic information words, while building a geographic information word group database based on the elements that make up the geographic information words (i.e., geographic information words). When the target geographic information vocabulary to be distinguished fails to hit the geographic information vocabulary database, the target geographic information vocabulary is segmented, the obtained segmented words are matched with elements forming the geographic information vocabulary, and the sensitivity of the target geographic information vocabulary is distinguished in the element dimension. Compared with the mode of simply relying on the geographic information vocabulary database to judge the sensitivity of the target geographic information vocabulary, the method and the device can effectively improve the accuracy of the sensitivity judging result of the geographic information vocabulary under the condition that the geographic information vocabulary database is not updated timely. Specific implementations of the application are illustrated below.
In step S11, a target geographic information vocabulary may be acquired. Wherein the target geographic information vocabulary corresponds to points of interest (Point of Interest, POIs), which represent geographic locations with uniqueness and certainty. That is, the target geographic information vocabulary corresponds to geographic locations having uniqueness and certainty. For example, the target geographic information vocabulary may be an address, such as "XX area XX street XX number in XX city XX", or a unique and deterministic surface or building name, such as "hometown", "eiffel tower", etc.
The acquisition mode of the target geographic information vocabulary can be different according to different practical application scenes. For example, in an intelligent driving scenario, the target geographic information vocabulary may be collected by an onboard sensor. In the intelligent traffic management system, information such as traffic flow, road conditions and the like can be identified by utilizing a monitoring camera installed at an intersection or a highway and combining with a computer vision technology, so that a target geographic information vocabulary is obtained. In the application of map and location services, the target geographic information vocabulary may be obtained by interfacing with an interface of an open map service.
In step S12, a pre-established geographic information vocabulary database may be obtained, where the geographic information vocabulary database includes a plurality of geographic information vocabularies, and each geographic information vocabulary corresponds to a point of interest. The number of geographic information words in the geographic information word database may include sensitive words and non-sensitive words. The sensitive vocabulary refers to geographic information vocabulary with sensitivity and unreliability, and the non-sensitive vocabulary refers to geographic information vocabulary without sensitivity and publicability.
In some embodiments, tag information may be added to each geographic information word in the geographic information word database to indicate whether the geographic information word is a sensitive word or a non-sensitive word. In other embodiments, different geographic information vocabulary databases may be established for sensitive vocabulary and non-sensitive vocabulary, respectively. For example, a positive vocabulary database and a negative vocabulary database may be established, wherein the geographic information vocabularies included in the positive vocabulary database are all non-sensitive vocabularies, and the geographic information vocabularies in the negative vocabulary database are all sensitive vocabularies.
The geographic information vocabulary database may also be updated at a certain frequency, for example, by deleting an existing geographic information vocabulary in the geographic information vocabulary database, adding a new geographic information vocabulary in the geographic information vocabulary database, and/or changing the type of geographic information vocabulary, such as by migrating a non-sensitive vocabulary in the positive vocabulary database to the negative vocabulary database.
In some embodiments, the geographic information vocabulary database further includes version information for representing a current version of the geographic information vocabulary database. The content of the geographic information vocabulary databases of different versions can be different, and the updated geographic information vocabulary databases of each version can be conveniently managed by adding version information.
In some embodiments, the geographic information vocabulary database further includes a file flag for indicating the validity of the geographic information vocabulary database. When in use, the file mark in the geographic information vocabulary database can be read first, and the read file mark is matched with the preset effective file mark. If the matching is successful, the geographic information vocabulary database is an effective geographic information vocabulary database, so that the sensitivity judgment on the target geographic information vocabulary can be performed based on the geographic information vocabulary database.
In some embodiments, each geographic information word in the geographic information word database may be encrypted. In particular, the geographic information vocabulary may be encrypted by encryption techniques into an irreversible digital unique identification. Encryption means include, but are not limited to, hash algorithms, hash encryption, symmetric encryption algorithms, asymmetric encryption algorithms, and the like. Further, at the time of encryption, since the geographical information vocabulary in the negative vocabulary database is not publicable, only the geographical information vocabulary in the negative vocabulary database may be encrypted. Or the geographic information words in the positive word database and the geographic information words in the negative word database can be encrypted.
The geographic information vocabulary database of some embodiments is shown in fig. 2, and comprises header information and table data, wherein the header information comprises file marks and file version information, and the table data is used for recording geographic information vocabularies, and each geographic information vocabulary corresponds to a corresponding number unique identifier one by one. In the example where the geographic information vocabulary database includes a positive vocabulary database and a negative vocabulary database, both the positive vocabulary database and the negative vocabulary database may employ the data structure shown in fig. 2.
After the target geographic information vocabulary is obtained, it may be determined whether the target geographic information vocabulary hits the geographic information vocabulary database. If the target geographic information vocabulary is the same as any geographic information vocabulary in the geographic information vocabulary database, the target geographic information vocabulary can be determined to hit the geographic information vocabulary database. If the target geographic information vocabulary is different from each geographic information vocabulary in the geographic information vocabulary database, the target geographic information vocabulary can be determined to miss the geographic information vocabulary database.
In the example where the geographic information vocabulary database includes a positive vocabulary database and a negative vocabulary database, it may be determined whether the target geographic information vocabulary hits the positive vocabulary database first, and if not, then whether the target geographic information vocabulary hits the negative vocabulary database. Or determining whether the target geographic information word hits the negative word database, and if not, determining whether the target geographic information word hits the positive word database. Or the step of determining whether the target geographic information vocabulary hits the positive vocabulary database and determining whether the target geographic information vocabulary hits the negative vocabulary database may be performed concurrently.
If the target geographic information vocabulary hits the geographic information vocabulary database, the sensitivity of the target geographic information vocabulary can be determined based on the type of the geographic information vocabulary hit by the target geographic information vocabulary in the geographic information vocabulary database. For example, if the type of the hit geographic information vocabulary is a sensitive vocabulary, the target geographic information vocabulary may be determined to be a sensitive vocabulary. If the type of the hit geographic information vocabulary is a non-sensitive vocabulary, the target geographic information vocabulary can be determined to be the non-sensitive vocabulary.
In the example where the geographic information vocabulary database includes a positive vocabulary database and a negative vocabulary database, if the target geographic information vocabulary hits the non-sensitive vocabulary in the positive vocabulary database, the target geographic information vocabulary may be directly determined to be the non-sensitive vocabulary. Similarly, if the target geographic information vocabulary hits the non-sensitive vocabulary in the negative vocabulary database, the target geographic information vocabulary can be directly determined to be the sensitive vocabulary.
If the target geographic information vocabulary does not hit the positive vocabulary database or the negative vocabulary database, it may be determined that the target geographic information vocabulary does not hit the geographic information vocabulary database, and at this time, step S13 may be executed.
In some embodiments, each geographic information word in the geographic information word database is pre-encrypted. In this case, the target geographic information vocabulary may be encrypted based on the encryption manner of each geographic information vocabulary in the geographic information vocabulary database, and then it may be determined whether the encrypted target geographic information vocabulary hits the pre-established geographic information vocabulary database. In this way, the risk of leakage of the geographic information vocabulary in the geographic information vocabulary database can be reduced.
In step S13, a pre-established geographical information phrase database may be obtained, where the geographical information phrase database includes a plurality of geographical information phrases and tag information thereof. The geographic information word groups are elements forming geographic information words, and a plurality of geographic information word groups can form the geographic information words. The geographic information vocabulary includes, but is not limited to, at least some of the elements representing countries, provinces, cities, regions, streets, building information (e.g., names or numbers of buildings), scenic spot names, and the like. For example, one geographic information vocabulary may be "XX street XX park in XX City XX, XX province". Each geographical information phrase may include one or more of the elements described above, e.g., one geographical information phrase may be "XX park", another geographical information phrase may be "XX city XX street", and yet another geographical information phrase may be "XX province XX city". The geographic information phrase corresponds to one or more regions, each of which may include multiple points of interest, i.e., the geographic location to which the geographic information phrase corresponds has non-uniqueness and uncertainty. For example, the geographic information phrase "XX park" may be either an XX park in A market or an XX park in B market. For another example, the geographic information phrase "XX province XX city" includes a plurality of points of interest in government buildings, markets, zoos, and the like for that city.
Because the geographic location corresponding to the geographic information phrase has non-uniqueness and uncertainty, it is not possible to directly determine whether the geographic information phrase is sensitive or non-sensitive. However, the geographic information word group is an element composing the geographic information word group, and the sensitive word group generally includes a specific element such as "army", "government building", etc., so that the probability that the geographic information word group including a certain geographic information word group is the sensitive word group can be approximately inferred. That is, tag information may be established for each geographic information phrase to represent a probability that the geographic information word comprising the geographic information phrase is a sensitive word or a non-sensitive word. For example, the probability that the geographic information word including the geographic information word group of "army" is a sensitive word is high, so that the probability that the tag information corresponding to the geographic information word group of "army" is represented is also high, for example, may be 0.7, while the probability that the geographic information word including the geographic information word group of "face hall" is a sensitive word is high, so that the probability that the tag information corresponding to the geographic information word group of "face hall" is represented is also small, for example, may be 0.1. It is to be understood that the numerical values herein are merely exemplary and are not intended to limit the present disclosure.
In some embodiments, the geographic information phrase database may include a positive phrase database and a negative phrase database. The probability represented by the label information corresponding to the geographic information phrase in the positive phrase database is larger than the probability represented by the label information corresponding to the geographic information phrase in the negative phrase database. That is, if the target geographic information vocabulary includes the geographic information vocabulary in the positive phrase database, the target geographic information vocabulary has a higher probability of being a sensitive vocabulary, and if the target geographic information vocabulary includes the geographic information vocabulary in the negative phrase database, the target geographic information vocabulary has a lower probability of being a sensitive vocabulary.
The geographic information phrase database may also be updated at a certain frequency, for example, by deleting an existing geographic information phrase in the geographic information phrase database, adding a new geographic information phrase in the geographic information phrase database, and/or modifying tag information of the geographic information phrase.
In some embodiments, the geographic information phrase database further includes version information for representing a current version of the geographic information phrase database. The content of the geographic information phrase databases of different versions can be different, and the updated geographic information phrase databases of each version can be conveniently managed by adding version information.
In some embodiments, the geographic information phrase database further includes a file flag for indicating the validity of the geographic information phrase database. When in use, the file mark in the geographic information phrase database can be read first, and the read file mark is matched with the preset effective file mark. If the matching is successful, the geographical information phrase database is an effective geographical information phrase database, so that the sensitivity judgment on the target geographical information words can be performed based on the geographical information phrase database.
In some embodiments, the geographic information phrase database may also include weights for the geographic information phrase database. The weights employed by different geographic information phrase databases (e.g., positive phrase database and negative phrase database) may be different. By setting the weight for the geographic information phrase database, the influence degree of the label information of the geographic information phrases in the geographic information phrase database on the sensitivity discrimination result can be adjusted.
In some embodiments, the geographic information phrase database may also include weights for individual geographic information phrases. By setting weight (called phrase weight) for the geographic information phrases, the influence degree of the label information corresponding to the geographic information phrases on the sensitivity discrimination result can be adjusted. For example, a geographic information word comprising a geographic information word set of "army" may be more likely to be a sensitive word than a geographic information word set comprising a geographic information word set of "facial museum", and thus the weight of the geographic information word set of "army" may be greater than the weight of the geographic information word set of "facial museum".
In some embodiments, each geographic information phrase in the geographic information phrase database, tag information corresponding to the geographic information phrase, a weight of the geographic information phrase database, and a weight corresponding to the geographic information phrase may be encrypted.
The geographic information phrase database of some embodiments is shown in fig. 3, and includes header information and table data, where the header information includes a file flag, file version information, and weights of the geographic information phrase database, and the table data is used to record the geographic information phrases, tag information of the geographic information phrases, and weights of the geographic information phrases. The weight of the geographic information phrase database, the geographic information phrase, the label information of the geographic information phrase and the weight of the geographic information phrase can be encrypted data. In the example where the geographic information phrase database includes a positive phrase database and a negative phrase database, both the positive phrase database and the negative phrase database may employ the data structure shown in fig. 3.
In step S14, the target geographic information vocabulary may be segmented to obtain a plurality of segmented words. The plurality of word segments may include a geographical information phrase, and may also include a non-geographical information phrase, including but not limited to a name, time, number, etc. For example, the target geographic information vocabulary is "Zhang Sanqing", and the word segmentation of the target geographic information vocabulary can obtain two word segments of "Zhang Sanzhu" and "Pinggu", wherein, "Zhang Sanzhu" is a name of a person and "Pinggu" is a geographic information phrase.
Each of the plurality of tokens may be matched with a geographic information phrase in a geographic information phrase database. In the case that the geographic information phrase database comprises a positive phrase database and a negative phrase database, each word segment can be matched with the geographic information phrase in the positive phrase database, if a word segment fails to be successfully matched with the geographic information phrase in the positive phrase database, then the word segment is matched with the geographic information phrase in the negative phrase database, or each word segment can be matched with the geographic information phrase in the negative phrase database, if a word segment fails to be successfully matched with the geographic information phrase in the negative phrase database, then the word segment is matched with the geographic information phrase in the positive phrase database, or each word segment can be matched with the geographic information phrase in the positive phrase database and the geographic information phrase in the negative phrase database in parallel.
In some embodiments, each geographic information word and its tag information in the geographic information word group database is pre-encrypted. The geographical information phrase in the geographical information phrase database can be decrypted first, and each word segment in the plurality of word segments is matched with the decrypted geographical information phrase in the geographical information phrase database. In addition, the label information of the geographic information word groups successfully matched with the plurality of word segmentation words in the geographic information word group database can be decrypted, and the sensitivity of the target geographic information word group is determined based on the decrypted label information of each geographic information word group successfully matched.
For example, each geographic information phrase in the geographic information phrase database may be decrypted first, and the decrypted geographic information phrase may be matched with a plurality of segmentations included in the target geographic information vocabulary. Assuming that the geographic information word group a in the geographic information word group database is successfully matched with a certain word segmentation word in the target geographic information word group, the label information of the geographic information word group a can be decrypted, and the sensitivity of the target geographic information word group is determined based on the decrypted label information of the geographic information word group a.
It will be appreciated that the above is but one manner of realisation. In other implementations, the method may also include encrypting the plurality of word segments in the target geographic information word group according to an encryption mode of the geographic information word group, matching the encrypted plurality of word segments with the geographic information word group in the geographic information word group database, screening out successfully matched geographic information word groups, decrypting tag information of the successfully matched geographic information word groups, and determining sensitivity of the target geographic information word group based on the decrypted tag information. The method can reduce the leakage of the geographic information phrase and improve the data security.
Through the matching process, the geographical information word groups successfully matched with each word segmentation in the geographical information word group database can be determined, and the sensitivity of the target geographical information word group can be determined based on the determined geographical information word groups. For example, the target geographic information word is segmented to obtain A, B, C, D words, wherein, A and B are successfully matched with the geographic information word group a in the positive word group database, C is not successfully matched with any geographic information word group in the geographic information word group database, D is successfully matched with the geographic information word group D in the negative word group database, and the sensitivity of the target geographic information word can be determined based on the label information of the geographic information word group a and the label information of the geographic information word group D.
In some embodiments, when matching the word segment with the geographic information phrase, if the word segment is the same as a certain geographic information phrase, it may be determined that the word segment is successfully matched with the geographic information phrase. If the word segmentation is different from the geographic information phrase, determining that the word segmentation fails to match with the geographic information phrase.
In other embodiments, the word segmentation and the geographic information phrase can be respectively matched in two dimensions of pronunciation and font, so as to obtain pronunciation similarity and font similarity of the word segmentation and the geographic information phrase, the total similarity between the word segmentation and the geographic information phrase is determined based on the pronunciation similarity and the font similarity, and whether the word segmentation and the geographic information phrase are matched is determined based on the total similarity. For example, when the total similarity is greater than a preset similarity threshold, it may be determined that the match between the word segment and the geographic information phrase is successful, or else, it is determined that the match between the word segment and the geographic information phrase is failed. In this way, the situation that the matching is inaccurate due to misspelling or pronunciation approximation can be reduced, for example, when a user uploads a geographic information vocabulary of 'XX army' through an interface with an open map service, the 'army' is input as a 'part pair', or when the vehicle-mounted voice output module outputs 'Luo Yang', the 'army' is mistakenly identified as 'Luoyang'.
In some embodiments, each word segment of the plurality of word segments may be matched with a geographic information phrase in a positive phrase database, first tag information of each first geographic information phrase successfully matched with the plurality of word segments in the positive phrase database is determined, each word segment of the plurality of word segments is matched with a geographic information phrase in a negative phrase database, second tag information of each second geographic information phrase successfully matched with the plurality of word segments in the negative phrase database is determined, and sensitivity of the target geographic information phrase is determined based on the first tag information of each first geographic information phrase and the second tag information of each second geographic information phrase.
Following the previous example, the first geographic information phrase successfully matched with the plurality of word segments in the positive phrase database includes a geographic information phrase a, and the second geographic information phrase successfully matched with the plurality of word segments in the negative phrase database includes a geographic information phrase d, so that the sensitivity of the target geographic information vocabulary can be determined based on the tag information of the geographic information phrase a and the tag information of the geographic information phrase d.
Further, a first weight corresponding to the positive phrase database and a second weight corresponding to the negative phrase database may be determined, first tag information of each first geographic information phrase is weighted based on the first weight, second tag information of each second geographic information phrase is weighted based on the second weight, and sensitivity of the target geographic information vocabulary is determined based on the weighted first tag information of each first geographic information phrase and the weighted second tag information of each second geographic information phrase.
Following the previous example, the tag information of the geographic information phrase a may be weighted based on a first weight and the tag information of the geographic information phrase d may be weighted based on a second weight. Then, whether the target geographic information vocabulary is a sensitive vocabulary is determined based on the label information weighted by the geographic information phrase a and the label information weighted by the geographic information phrase d.
Further, before determining the sensitivity of the target geographic information word group based on the first tag information of each first geographic information word group and the second tag information of each second geographic information word group, the weight of each first geographic information word group and the weight of each second geographic information word group may also be determined, the first tag information of the corresponding first geographic information word group is weighted based on the weight of each first geographic information word group, and the second tag information of the corresponding second geographic information word group is weighted based on the weight of each second geographic information word group.
In some embodiments, the weight of a geographic information phrase is related to the level of sensitivity of the geographic information phrase. The sensitivity level of the geographic information phrase can be determined first, and then the weight of the geographic information phrase is determined according to the sensitivity level of the geographic information phrase. The higher the sensitivity level of the geographic information phrase is, the higher the probability that the geographic information word comprising the geographic information phrase is a sensitive word is. The weights corresponding to the different sensitivity levels may be different.
It will be appreciated that in practical applications, the number of individual geographic information phrases that are successfully matched with multiple word segments in the target geographic information vocabulary may not be limited to the case illustrated in the above example, but the sensitivity of the target geographic information vocabulary may be determined in a manner similar to that in the above embodiment, regardless of the number of geographic information phrases that are successfully matched.
A specific embodiment of the present application and its application scenario will be illustrated with reference to the accompanying drawings.
The application can be used in the fields of intelligent network automobiles and automatic driving, and can be used for judging the sensitivity of geographic information words collected by the vehicle-mounted sensor and giving out a judging result. The application mainly comprises three parts, namely a knowledge base, judging reference materials and semantic judgment.
The whole system architecture is shown in fig. 4 and comprises a plurality of parts including a compliance Jiang Zheng word stock, a compliance strong positive word stock comparison table, a compliance strong positive semantic judgment module, a compliance strong negative word stock comparison table, a compliance strong negative semantic judgment module, a compliance weak positive word stock comparison table, a compliance weak positive semantic judgment module, a compliance weak negative word stock comparison table and a compliance weak negative semantic judgment module.
The knowledge bases comprise a positive vocabulary database (also called a positive word library with high compliance), a negative vocabulary database (also called a negative word library with high compliance), a positive phrase database (also called a weak positive word library with compliance) and a negative phrase database (also called a weak negative word library with compliance), and the knowledge bases can be continuously accumulated and updated by the actual combat experience of expert teams.
The geographic information vocabulary in the strong positive word stock is a publicizable and non-sensitive geographic information vocabulary (namely, the non-sensitive vocabulary in the embodiment) related to the intelligent network-connected automobile and the automatic driving field;
The geographic information vocabulary in the compliance strong negative word stock is an unpublishable and strong sensitive geographic information vocabulary (namely a sensitive vocabulary in the embodiment) related to the intelligent network-connected automobile and the automatic driving field;
the geographic information word group in the combined weak positive word library is a publicly available word group related to the intelligent network-connected automobile and automatic driving field, but can generate an objection geographic information word group when combined with a weak sensitive word group, so that the label information in the combined weak positive word library is used for representing the probability that the geographic information word comprising the geographic information word group is a non-sensitive word;
The geographic information word group in the combined weak negative word library is a weak sensitive geographic information word group related to the intelligent network-connected automobile and automatic driving field, so that the label information in the combined weak positive word library is used for representing the probability that the geographic information word comprising the geographic information word group is a sensitive word.
The judging reference data comprises two contents of a comparison table and a judging principle. The comparison table comprises a strong-compliance positive word stock comparison table (comprising a plurality of non-sensitive words), a strong-compliance negative word stock comparison table (comprising a plurality of sensitive words), a weak-compliance positive word stock comparison table (comprising a plurality of geographic information words and label information thereof, wherein the label information is used for representing the probability that the geographic information words comprising the geographic information words are the non-sensitive words), and a weak-compliance negative word stock comparison table (comprising a plurality of geographic information words and label information thereof, wherein the label information is used for representing the probability that the geographic information words comprising the geographic information words are the sensitive words).
And generating irreversible digital unique identifiers by adopting a commercial-secret data encryption protection technology for all vocabulary of the positive word stock with strong compliance, and storing the irreversible digital unique identifiers in a private format file, namely the unique identifiers are the comparison table of the positive word stock with strong compliance. Similarly, all vocabulary of the strong negative word stock with the compliance can be generated into irreversible digital unique identifiers by adopting a commercial-secret data encryption protection technology and stored in a file with a private format, namely the strong negative word stock with the compliance is a comparison table.
The method can continuously accumulate and update the combined weak word stock according to expert team actual combat experience, determine the sensitivity level of the geographic information word group according to the degree of possible objection generated by combining the combined weak word stock with the geographic information word group, and assign different weights to the geographic information word groups with different sensitivity levels. For example, N sensitivity levels may be formed by dividing the sensitivity levels into N weights. And the weight of the geographic information phrase can be adjusted by cooperating with the weak and negative word bank after updating the weak and positive word bank every time. And encrypting the weight, the label information, the weight of the combined weak positive word stock and other information of the geographical information word group, and storing the encrypted information in a private format file, namely a combined weak positive word stock comparison table. The specific steps are shown in fig. 5.
The processing manner of the comparison table of the compliance weak negative word stock is similar, and is not repeated here.
The various weights involved in the above embodiments may be determined based on the technical decision rule formed by continuously accumulating and updating expert team actual combat experiences, depending on the compliance weak positive word bank score table and the compliance weak negative word bank score table. Meanwhile, scoring and judging links of the semantic judgment module depend on scoring judgment principles.
As shown in FIG. 6, the semantic judgment mainly comprises the following steps of 1) judging a positive word stock with high compliance, 2) judging a negative word stock with high compliance, 3) judging a weak positive word stock with low compliance, 4) judging a weak negative word stock with low compliance, and 5) outputting a judgment result. The judgment result can determine whether the input geographic information vocabulary can perform activities such as disclosure or transmission.
① High-compliance positive word stock determination
Generating an irreversible digital unique identifier by adopting a commercial-secret data encryption protection technology on an input target geographic information vocabulary, judging whether the digital unique identifier is contained in a positive word stock comparison table with strong compliance, if so, determining that the target geographic information vocabulary is a non-sensitive vocabulary, returning a judging result, ending a judging process, and if not, entering a step ② to continue judging;
② Compliance strong negative word stock determination
Generating an irreversible digital unique identifier by adopting a commercial-secret data encryption protection technology on an input target geographic information vocabulary, judging whether the digital unique identifier is contained in a comparison table of a strong compliance negative word stock, if so, determining that the target geographic information vocabulary is a sensitive vocabulary, returning a judging result, and ending a judging process, and if not, entering a step ③ to continue judging;
③ Compliance weak positive word stock determination
Traversing a classification table of the combined weak positive word stock, carrying out regular matching on an input target geographic information word and each decrypted geographic information word group in the classification table of the combined weak positive word stock to obtain weight and probability (namely label information) of each geographic information word group, and accumulating and calculating the probability corresponding to each successfully matched geographic information word group in the classification table of the combined weak positive word stock to obtain accumulated total score, wherein a dynamic scoring formula is as follows:
Wherein y positive represents the accumulated total score of each successfully matched geographic information phrase in the classification table of the weak positive word bank, alpha 12,N represents the weight of each of N geographic information phrases in the classification table of the weak positive word bank, and x 1,x2,,xN represents the probability of each corresponding N geographic information phrases in the classification table of the weak positive word bank based on the sensitivity level determination of the corresponding geographic information phrase.
④ Compliance weak negative word stock determination
Traversing a combined weak and negative word stock scoring table, performing regular matching on an input target geographic information word and each decrypted geographic information word group in the combined weak and negative word stock scoring table to obtain the dimension weight and probability (namely label information) of each geographic information word group, and accumulating the probability corresponding to each successfully matched geographic information word group in the combined weak and negative word stock scoring table to obtain an accumulated total score, wherein the dynamic scoring formula is as follows:
Wherein y negative represents the accumulated total score of each geographical information phrase successfully matched in the combined weak negative word stock scoring table, beta 12,M represents the weight of each of M geographical information phrases in the combined weak negative word stock scoring table, and y 1,y2,,yM represents the probability of each of M geographical information phrases in the combined weak negative word stock scoring table based on the sensitivity level determination of the corresponding geographical information phrase.
⑤ Outputting the determination result
For a certain target geographic information vocabulary, the sensitivity of the target geographic information vocabulary is determined based on the accumulated total score of the compliant weak positive word bank and the score of the compliant weak negative word bank, and the specific formula is as follows:
Wherein, gamma positive and gamma negative respectively represent the weight of the compliance weak positive word bank and the weight of the compliance weak negative word bank, and S total represents the probability that the target geographic information word is a sensitive word.
The weight of the compliant weak positive word stock and the weight of the compliant weak negative word stock are used for ensuring the consistency of the value domain when the accumulated total score is dynamically updated in the knowledge base, and the accumulated total score is respectively from a compliant weak positive word stock scoring table file and a compliant weak negative word stock scoring table file and is dynamically updated along with the synchronization of the two word stocks.
Based on the accumulated total score S total and the built-in decision principle, a unique decision result can be given, where the decision result is used to indicate whether the target geographic information vocabulary can perform activities such as disclosure or transmission. For example, when S total is greater than a certain threshold, it may be determined that the target geographic information vocabulary is a sensitive vocabulary and activities such as disclosure or transmission may not be performed, and when S total is less than or equal to the above threshold, it may be determined that the target geographic information vocabulary is a non-sensitive vocabulary and activities such as disclosure or transmission may be performed.
The application constructs a multidimensional knowledge base based on the actual combat experience accumulated by expert teams for a long time. The knowledge base not only covers wide industry knowledge and experience, but also ensures to keep synchronous with industry development through continuous updating of expert teams, and can cope with the changing hazard identification requirements.
Based on this multidimensional knowledge base, the application further derives decision reference materials and semantic decision logic. These decision references and semantic decision logic combine the expertise and experience within the industry to comprehensively evaluate the input vocabulary from multiple perspectives. By scoring the sensitivity of the vocabulary from multiple dimensions, the potential hazard level of the vocabulary can be more accurately determined.
In order to better promote the accuracy of evaluation, the application designs a strict and scientific semantic judgment module. This module does not simply perform vocabulary matching, but evaluates whether it constitutes a hazard by assigning different weights and probabilities to different geographical information phrases. The multi-step and comprehensive scoring mechanism is adopted, and the sensitivity level of the vocabulary is comprehensively considered, so that a more accurate and reliable judgment result is obtained.
As shown in fig. 7, the present application further provides a device for determining sensitivity of a geographic information vocabulary, where the device includes:
A first obtaining module 101, configured to obtain a target geographic information vocabulary;
a determining module 102, configured to determine whether the target geographic information vocabulary hits a pre-established geographic information vocabulary database, where the geographic information vocabulary database includes a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to a point of interest, the point of interest represents a geographic location with uniqueness and certainty, and the plurality of geographic information vocabularies includes a sensitive vocabulary and a non-sensitive vocabulary;
A second obtaining module 103, configured to obtain a pre-established geographical information phrase database if the target geographical information phrase does not hit the geographical information phrase database, where the geographical information phrase database includes a plurality of geographical information phrases and tag information thereof, and the plurality of geographical information phrases can form a geographical information phrase, and the tag information of the geographical information phrase is used to represent a probability that the geographical information phrase including the geographical information phrase is a sensitive word or a non-sensitive word;
the matching module 104 is configured to segment the target geographic information vocabulary to obtain a plurality of segments, match each segment in the plurality of segments with a geographic information phrase in the geographic information phrase database, and determine the sensitivity of the target geographic information vocabulary based on the tag information of each geographic information phrase successfully matched with the segment in the geographic information phrase database.
The functions or modules included in the apparatus provided by the present application may be used to perform the methods described in the foregoing method embodiments, and specific implementation of the methods may refer to the descriptions in the foregoing method embodiments, which are not repeated herein for brevity.
The embodiment of the application also provides a computer device, which at least comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the method of any of the previous embodiments.
Fig. 8 shows a more specific hardware architecture of a computer device, which may include a processor 201, a memory 202, an input/output interface 203, a communication interface 204, and a bus 205, according to an embodiment of the present application. Wherein the processor 201, the memory 202, the input/output interface 203 and the communication interface 204 are communicatively coupled to each other within the device via a bus 205.
The processor 201 may be implemented by a general purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application. The processor 201 may also include a graphics card, which may be NVIDIA TITAN X graphics card, 1080Ti graphics card, or the like.
The Memory 202 may be implemented in the form of Read Only Memory (ROM), random access Memory (Random Access Memory, RAM), static storage devices, dynamic storage devices, and the like. Memory 202 may store an operating system and other application programs, and when implementing the techniques provided by embodiments of the present application by software or firmware, the associated program code is stored in memory 202 and invoked for execution by processor 201.
The input/output interface 203 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The communication interface 204 is used to connect with a communication module (not shown in the figure) to enable communication interaction between the present device and other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 205 includes a path to transfer information between components of the device (e.g., processor 201, memory 202, input/output interface 203, and communication interface 204).
It should be noted that, although the above device only shows the processor 201, the memory 202, the input/output interface 203, the communication interface 204, and the bus 205, in the implementation, the device may further include other components necessary for achieving normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary for implementing the embodiments of the present application, and not all the components shown in the drawings.
Embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present application.
The embodiment of the present application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to any of the previous embodiments.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computer device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing embodiments of the present application. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing is merely illustrative of the principles of this application, and it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of this application.

Claims (12)

1. A method for determining the sensitivity of a geographic information vocabulary, the method comprising:
acquiring a target geographic information vocabulary;
Determining whether the target geographic information vocabulary hits a pre-established geographic information vocabulary database, wherein the geographic information vocabulary database comprises a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to an interest point, the interest point represents a geographic position with uniqueness and certainty, and the plurality of geographic information vocabularies comprise sensitive vocabularies and non-sensitive vocabularies;
If the target geographic information word does not hit the geographic information word database, a pre-established geographic information word group database is obtained, wherein the geographic information word group database comprises a plurality of geographic information word groups and tag information thereof, the geographic information word groups can form geographic information word groups, and the tag information of the geographic information word groups is used for representing the probability that the geographic information word groups comprising the geographic information word groups are sensitive words or non-sensitive words;
The target geographic information word is segmented to obtain a plurality of segmented words, each segmented word in the plurality of segmented words is matched with the geographic information word group in the geographic information word group database, and the sensitivity of the target geographic information word is determined based on the label information of each geographic information word group successfully matched with the segmented word in the geographic information word group database.
2. The method according to claim 1, wherein the method further comprises:
and if the target geographic information vocabulary hits the geographic information vocabulary database, determining the sensitivity of the target geographic information vocabulary based on the type of the geographic information vocabulary hit by the target geographic information vocabulary in the geographic information vocabulary database.
3. The method of claim 2, wherein the geographic information vocabulary database comprises a positive vocabulary database and a negative vocabulary database, the positive vocabulary database comprising the non-sensitive vocabulary and the negative vocabulary database comprising the sensitive vocabulary;
the determining the sensitivity of the target geographic information vocabulary based on the type of the geographic information vocabulary hit by the target geographic information vocabulary in the geographic information vocabulary database comprises:
if the target geographic information vocabulary hits the non-sensitive vocabulary in the positive vocabulary database, determining that the target geographic information vocabulary is the non-sensitive vocabulary;
and if the target geographic information vocabulary hits the sensitive vocabulary in the negative vocabulary database, determining that the target geographic information vocabulary is the sensitive vocabulary.
4. The method according to claim 1, wherein the geographic information phrase database comprises a positive phrase database and a negative phrase database, and the probability represented by the tag information corresponding to the geographic information phrase in the positive phrase database is greater than the probability represented by the tag information corresponding to the geographic information phrase in the negative phrase database;
The matching of each word segment of the plurality of word segments with the geographic information word groups in the geographic information word group database, and the determination of the sensitivity of the target geographic information word group based on the label information of each geographic information word group successfully matched with the word segment in the geographic information word group database comprises the following steps:
Matching each word segment in the plurality of word segments with the geographic information word groups in the positive word group database, and determining first tag information of each first geographic information word group successfully matched with the plurality of word segments in the positive word group database;
Matching each word segment in the plurality of word segments with the geographic information word groups in the negative word group database, and determining second tag information of each second geographic information word group successfully matched with the plurality of word segments in the negative word group database;
And determining the sensitivity of the target geographic information word based on the first label information of each first geographic information word group and the second label information of each second geographic information word group.
5. The method of claim 4, wherein determining the sensitivity of the target geographic information vocabulary based on the first tag information for each first geographic information phrase and the second tag information for each second geographic information phrase comprises:
Determining a first weight corresponding to the positive phrase database and a second weight corresponding to the negative phrase database;
weighting the first tag information of each first geographic information phrase based on the first weight, and weighting the second tag information of each second geographic information phrase based on the second weight;
and determining the sensitivity of the target geographic information word based on the weighted first tag information of each first geographic information word group and the weighted second tag information of each second geographic information word group.
6. The method of claim 4, wherein prior to determining the sensitivity of the target geographic information vocabulary based on the first tag information for each first geographic information phrase and the second tag information for each second geographic information phrase, the method further comprises:
Determining the weight of each first geographic information phrase and the weight of each second geographic information phrase, wherein the weight of the geographic information phrase is related to the sensitivity level of the geographic information phrase, and the higher the sensitivity level of the geographic information phrase is, the higher the probability that the geographic information word comprising the geographic information phrase is a sensitive word is;
the first tag information of the corresponding first geographic information phrase is weighted based on the weight of each first geographic information phrase, and the second tag information of the corresponding second geographic information phrase is weighted based on the weight of each second geographic information phrase.
7. The method according to claim 1, wherein each geographic information vocabulary in the geographic information vocabulary database is subjected to encryption processing in advance;
The determining whether the target geographic information vocabulary hits the pre-established geographic information vocabulary database comprises the following steps:
encrypting the target geographic information vocabulary based on the encryption mode of each geographic information vocabulary in the geographic information vocabulary database;
determining whether the encrypted target geographic information vocabulary hits the pre-established geographic information vocabulary database.
8. The method according to claim 1, wherein each geographic information vocabulary and its label information in the geographic information phrase database are subjected to encryption processing in advance;
The matching of each word segment of the plurality of word segments with the geographic information word groups in the geographic information word group database, and the determination of the sensitivity of the target geographic information word group based on the label information of each geographic information word group successfully matched with the word segment in the geographic information word group database comprises the following steps:
Decrypting the geographic information phrase in the geographic information phrase database;
Matching each word segment of the plurality of word segments with the decrypted geographic information word group in the geographic information word group database;
decrypting tag information of the geographic information phrase successfully matched with the plurality of word segmentation in the geographic information phrase database;
And determining the sensitivity of the target geographic information word based on the decrypted label information of each geographic information word group successfully matched.
9. A sensitivity determination apparatus for a geographic information vocabulary, the apparatus comprising:
the first acquisition module is used for acquiring a target geographic information vocabulary;
The determining module is used for determining whether the target geographic information vocabulary hits a pre-established geographic information vocabulary database, wherein the geographic information vocabulary database comprises a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to an interest point, the interest points represent geographic positions with uniqueness and certainty, and the geographic information vocabularies comprise sensitive vocabularies and non-sensitive vocabularies;
The second acquisition module is used for acquiring a pre-established geographic information phrase database if the target geographic information phrase does not hit the geographic information phrase database, wherein the geographic information phrase database comprises a plurality of geographic information phrases and tag information thereof, the geographic information phrases can form the geographic information phrase, and the tag information of the geographic information phrase is used for representing the probability that the geographic information phrase comprising the geographic information phrase is a sensitive word or a non-sensitive word;
The matching module is used for segmenting the target geographic information word to obtain a plurality of segmented words, matching each segmented word in the plurality of segmented words with the geographic information word group in the geographic information word group database, and determining the sensitivity of the target geographic information word based on the label information of each geographic information word group successfully matched with the segmented word in the geographic information word group database.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 8.
11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 8 when executing the computer program.
12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 8.
CN202411917340.0A 2024-12-24 2024-12-24 Method and device for determining sensitivity of geographic information vocabulary Active CN119830906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411917340.0A CN119830906B (en) 2024-12-24 2024-12-24 Method and device for determining sensitivity of geographic information vocabulary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411917340.0A CN119830906B (en) 2024-12-24 2024-12-24 Method and device for determining sensitivity of geographic information vocabulary

Publications (2)

Publication Number Publication Date
CN119830906A CN119830906A (en) 2025-04-15
CN119830906B true CN119830906B (en) 2025-10-21

Family

ID=95299020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411917340.0A Active CN119830906B (en) 2024-12-24 2024-12-24 Method and device for determining sensitivity of geographic information vocabulary

Country Status (1)

Country Link
CN (1) CN119830906B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121211030A (en) * 2025-11-28 2025-12-26 湖北亿咖通科技有限公司 Geographic information compliance evaluation method, device, storage medium, and program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729420A (en) * 2013-12-20 2014-04-16 潘大庆 Microblog hotspot tracking system and method
CN112183087A (en) * 2020-09-27 2021-01-05 武汉华工安鼎信息技术有限责任公司 System and method for sensitive text recognition

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380323B (en) * 2020-12-01 2024-11-05 合肥大多数信息科技有限公司 A junk information elimination system and method based on Chinese word segmentation recognition technology
CN117909538A (en) * 2024-01-19 2024-04-19 上海点掌文化科技股份有限公司 Sensitive word detection method, device, equipment and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729420A (en) * 2013-12-20 2014-04-16 潘大庆 Microblog hotspot tracking system and method
CN112183087A (en) * 2020-09-27 2021-01-05 武汉华工安鼎信息技术有限责任公司 System and method for sensitive text recognition

Also Published As

Publication number Publication date
CN119830906A (en) 2025-04-15

Similar Documents

Publication Publication Date Title
CN106649331B (en) Business circle identification method and equipment
CN107796411B (en) Navigation system with preference analysis mechanism and method of operation thereof
US11460307B2 (en) System and method for processing vehicle event data for journey analysis
KR102236571B1 (en) Maintaining point of interest data using wireless access points
US10955255B2 (en) Navigation system with location based parser mechanism and method of operation thereof
US9945676B2 (en) Navigation system with content curation mechanism and method of operation thereof
JP6464849B2 (en) Moving path data anonymization apparatus and method
US9086288B2 (en) Method and system for finding paths using GPS tracks
US20220046380A1 (en) System and method for processing vehicle event data for journey analysis
US10970184B2 (en) Event detection removing private information
CN119830906B (en) Method and device for determining sensitivity of geographic information vocabulary
US9959289B2 (en) Navigation system with content delivery mechanism and method of operation thereof
CN114117261B (en) Track detection method and device, electronic equipment and storage medium
US11741167B2 (en) Merging point-of-interest datasets for mapping systems
US9273972B2 (en) Navigation system with error detection mechanism and method of operation thereof
US11821748B2 (en) Processing apparatus and method for determining road names
CN107247716B (en) Method and device for increasing electronic eye information, navigation chip and server
Stephens et al. Development of a smartphone application serving pavement management engineers
Dey et al. Identification of parking spaces from multi‐modal trajectory data
Miller et al. An exploratory analysis of the effects of spatial and temporal scale and transportation mode on anonymity in human mobility trajectories
Schneider et al. D-TOUR: Detour-based point of interest detection in privacy-sensitive trajectories
CN115599875B (en) Track abnormality detection method, system and product
CN119377338B (en) Geographic object processing method, device, electronic device and computer storage medium
US12140690B2 (en) Positioning using locally unique neighbor cell identifiers
JP6054808B2 (en) Parallel road judgment device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant