Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
In order to better understand the technical solution in the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solution in the embodiments of the present application is described in further detail below with reference to the accompanying drawings.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and be provided with corresponding operation entries for the user to select authorization or rejection.
In many application scenarios it may involve the acquisition of geographical information, which may include sensitive information. In order to improve data security, sensitivity judgment is required for geographic information. For example, in a smart driving scenario, on-board sensors (e.g., GPS, radar, cameras, etc.) may be utilized to collect geographic information of the surrounding environment for planning routes and avoiding obstacles. The collected geographic information may include names of secret-related institutions, and the names of the secret-related institutions are not suitable to be disclosed, so that the names of the secret-related institutions are filtered out of the collected data through sensitivity judgment and are subjected to desensitization treatment.
In the related art, a database of sensitive geographic information is generally established, the geographic information to be distinguished is matched with the geographic information in the database, and whether the geographic information to be distinguished is sensitive or not is determined according to a matching result. For example, if the database of the sensitive geographic information includes the "XX military base", the geographic information to be discriminated is successfully matched with the "XX military base" in the database of the sensitive geographic information when the geographic information to be discriminated includes the "XX military base", so that the geographic information to be discriminated can be judged as the sensitive geographic information.
However, the geographic information may be updated continuously with time, place or environment changes, for example, according to the development of national conditions, a country may add some new administrative institutions in a secret place. The update of the database of the sensitive geographic information has hysteresis relative to the update of the geographic information, and if the geographic information corresponding to the newly added administrative organization cannot be recorded in the database of the sensitive geographic information in time, the sensitivity of the actually acquired geographic information cannot be accurately judged in a database matching mode.
Based on this, the present application provides a method for determining sensitivity of geographic information vocabulary, referring to fig. 1, the method includes:
s11, acquiring a target geographic information vocabulary;
Step S12, determining whether the target geographic information vocabulary hits a pre-established geographic information vocabulary database, wherein the geographic information vocabulary database comprises a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to an interest point, the interest points represent geographic positions with uniqueness and certainty, and the geographic information vocabularies comprise sensitive vocabularies and non-sensitive vocabularies;
Step S13, if the target geographic information word does not hit the geographic information word database, a pre-established geographic information word group database is obtained, wherein the geographic information word group database comprises a plurality of geographic information word groups and tag information thereof, the geographic information word groups can form geographic information word groups, and the tag information of the geographic information word groups is used for representing the probability that the geographic information word groups comprising the geographic information word groups are sensitive words or non-sensitive words;
step S14, word segmentation is carried out on the target geographic information word, a plurality of word segments are obtained, each word segment in the plurality of word segments is matched with the geographic information word group in the geographic information word group database, and the sensitivity of the target geographic information word is determined based on the label information of each geographic information word group successfully matched with the word segment in the geographic information word group database.
Considering that sensitive geographic information words (i.e., sensitive words) often include designated elements, the present application builds a geographic information word database based on the complete geographic information words, while building a geographic information word group database based on the elements that make up the geographic information words (i.e., geographic information words). When the target geographic information vocabulary to be distinguished fails to hit the geographic information vocabulary database, the target geographic information vocabulary is segmented, the obtained segmented words are matched with elements forming the geographic information vocabulary, and the sensitivity of the target geographic information vocabulary is distinguished in the element dimension. Compared with the mode of simply relying on the geographic information vocabulary database to judge the sensitivity of the target geographic information vocabulary, the method and the device can effectively improve the accuracy of the sensitivity judging result of the geographic information vocabulary under the condition that the geographic information vocabulary database is not updated timely. Specific implementations of the application are illustrated below.
In step S11, a target geographic information vocabulary may be acquired. Wherein the target geographic information vocabulary corresponds to points of interest (Point of Interest, POIs), which represent geographic locations with uniqueness and certainty. That is, the target geographic information vocabulary corresponds to geographic locations having uniqueness and certainty. For example, the target geographic information vocabulary may be an address, such as "XX area XX street XX number in XX city XX", or a unique and deterministic surface or building name, such as "hometown", "eiffel tower", etc.
The acquisition mode of the target geographic information vocabulary can be different according to different practical application scenes. For example, in an intelligent driving scenario, the target geographic information vocabulary may be collected by an onboard sensor. In the intelligent traffic management system, information such as traffic flow, road conditions and the like can be identified by utilizing a monitoring camera installed at an intersection or a highway and combining with a computer vision technology, so that a target geographic information vocabulary is obtained. In the application of map and location services, the target geographic information vocabulary may be obtained by interfacing with an interface of an open map service.
In step S12, a pre-established geographic information vocabulary database may be obtained, where the geographic information vocabulary database includes a plurality of geographic information vocabularies, and each geographic information vocabulary corresponds to a point of interest. The number of geographic information words in the geographic information word database may include sensitive words and non-sensitive words. The sensitive vocabulary refers to geographic information vocabulary with sensitivity and unreliability, and the non-sensitive vocabulary refers to geographic information vocabulary without sensitivity and publicability.
In some embodiments, tag information may be added to each geographic information word in the geographic information word database to indicate whether the geographic information word is a sensitive word or a non-sensitive word. In other embodiments, different geographic information vocabulary databases may be established for sensitive vocabulary and non-sensitive vocabulary, respectively. For example, a positive vocabulary database and a negative vocabulary database may be established, wherein the geographic information vocabularies included in the positive vocabulary database are all non-sensitive vocabularies, and the geographic information vocabularies in the negative vocabulary database are all sensitive vocabularies.
The geographic information vocabulary database may also be updated at a certain frequency, for example, by deleting an existing geographic information vocabulary in the geographic information vocabulary database, adding a new geographic information vocabulary in the geographic information vocabulary database, and/or changing the type of geographic information vocabulary, such as by migrating a non-sensitive vocabulary in the positive vocabulary database to the negative vocabulary database.
In some embodiments, the geographic information vocabulary database further includes version information for representing a current version of the geographic information vocabulary database. The content of the geographic information vocabulary databases of different versions can be different, and the updated geographic information vocabulary databases of each version can be conveniently managed by adding version information.
In some embodiments, the geographic information vocabulary database further includes a file flag for indicating the validity of the geographic information vocabulary database. When in use, the file mark in the geographic information vocabulary database can be read first, and the read file mark is matched with the preset effective file mark. If the matching is successful, the geographic information vocabulary database is an effective geographic information vocabulary database, so that the sensitivity judgment on the target geographic information vocabulary can be performed based on the geographic information vocabulary database.
In some embodiments, each geographic information word in the geographic information word database may be encrypted. In particular, the geographic information vocabulary may be encrypted by encryption techniques into an irreversible digital unique identification. Encryption means include, but are not limited to, hash algorithms, hash encryption, symmetric encryption algorithms, asymmetric encryption algorithms, and the like. Further, at the time of encryption, since the geographical information vocabulary in the negative vocabulary database is not publicable, only the geographical information vocabulary in the negative vocabulary database may be encrypted. Or the geographic information words in the positive word database and the geographic information words in the negative word database can be encrypted.
The geographic information vocabulary database of some embodiments is shown in fig. 2, and comprises header information and table data, wherein the header information comprises file marks and file version information, and the table data is used for recording geographic information vocabularies, and each geographic information vocabulary corresponds to a corresponding number unique identifier one by one. In the example where the geographic information vocabulary database includes a positive vocabulary database and a negative vocabulary database, both the positive vocabulary database and the negative vocabulary database may employ the data structure shown in fig. 2.
After the target geographic information vocabulary is obtained, it may be determined whether the target geographic information vocabulary hits the geographic information vocabulary database. If the target geographic information vocabulary is the same as any geographic information vocabulary in the geographic information vocabulary database, the target geographic information vocabulary can be determined to hit the geographic information vocabulary database. If the target geographic information vocabulary is different from each geographic information vocabulary in the geographic information vocabulary database, the target geographic information vocabulary can be determined to miss the geographic information vocabulary database.
In the example where the geographic information vocabulary database includes a positive vocabulary database and a negative vocabulary database, it may be determined whether the target geographic information vocabulary hits the positive vocabulary database first, and if not, then whether the target geographic information vocabulary hits the negative vocabulary database. Or determining whether the target geographic information word hits the negative word database, and if not, determining whether the target geographic information word hits the positive word database. Or the step of determining whether the target geographic information vocabulary hits the positive vocabulary database and determining whether the target geographic information vocabulary hits the negative vocabulary database may be performed concurrently.
If the target geographic information vocabulary hits the geographic information vocabulary database, the sensitivity of the target geographic information vocabulary can be determined based on the type of the geographic information vocabulary hit by the target geographic information vocabulary in the geographic information vocabulary database. For example, if the type of the hit geographic information vocabulary is a sensitive vocabulary, the target geographic information vocabulary may be determined to be a sensitive vocabulary. If the type of the hit geographic information vocabulary is a non-sensitive vocabulary, the target geographic information vocabulary can be determined to be the non-sensitive vocabulary.
In the example where the geographic information vocabulary database includes a positive vocabulary database and a negative vocabulary database, if the target geographic information vocabulary hits the non-sensitive vocabulary in the positive vocabulary database, the target geographic information vocabulary may be directly determined to be the non-sensitive vocabulary. Similarly, if the target geographic information vocabulary hits the non-sensitive vocabulary in the negative vocabulary database, the target geographic information vocabulary can be directly determined to be the sensitive vocabulary.
If the target geographic information vocabulary does not hit the positive vocabulary database or the negative vocabulary database, it may be determined that the target geographic information vocabulary does not hit the geographic information vocabulary database, and at this time, step S13 may be executed.
In some embodiments, each geographic information word in the geographic information word database is pre-encrypted. In this case, the target geographic information vocabulary may be encrypted based on the encryption manner of each geographic information vocabulary in the geographic information vocabulary database, and then it may be determined whether the encrypted target geographic information vocabulary hits the pre-established geographic information vocabulary database. In this way, the risk of leakage of the geographic information vocabulary in the geographic information vocabulary database can be reduced.
In step S13, a pre-established geographical information phrase database may be obtained, where the geographical information phrase database includes a plurality of geographical information phrases and tag information thereof. The geographic information word groups are elements forming geographic information words, and a plurality of geographic information word groups can form the geographic information words. The geographic information vocabulary includes, but is not limited to, at least some of the elements representing countries, provinces, cities, regions, streets, building information (e.g., names or numbers of buildings), scenic spot names, and the like. For example, one geographic information vocabulary may be "XX street XX park in XX City XX, XX province". Each geographical information phrase may include one or more of the elements described above, e.g., one geographical information phrase may be "XX park", another geographical information phrase may be "XX city XX street", and yet another geographical information phrase may be "XX province XX city". The geographic information phrase corresponds to one or more regions, each of which may include multiple points of interest, i.e., the geographic location to which the geographic information phrase corresponds has non-uniqueness and uncertainty. For example, the geographic information phrase "XX park" may be either an XX park in A market or an XX park in B market. For another example, the geographic information phrase "XX province XX city" includes a plurality of points of interest in government buildings, markets, zoos, and the like for that city.
Because the geographic location corresponding to the geographic information phrase has non-uniqueness and uncertainty, it is not possible to directly determine whether the geographic information phrase is sensitive or non-sensitive. However, the geographic information word group is an element composing the geographic information word group, and the sensitive word group generally includes a specific element such as "army", "government building", etc., so that the probability that the geographic information word group including a certain geographic information word group is the sensitive word group can be approximately inferred. That is, tag information may be established for each geographic information phrase to represent a probability that the geographic information word comprising the geographic information phrase is a sensitive word or a non-sensitive word. For example, the probability that the geographic information word including the geographic information word group of "army" is a sensitive word is high, so that the probability that the tag information corresponding to the geographic information word group of "army" is represented is also high, for example, may be 0.7, while the probability that the geographic information word including the geographic information word group of "face hall" is a sensitive word is high, so that the probability that the tag information corresponding to the geographic information word group of "face hall" is represented is also small, for example, may be 0.1. It is to be understood that the numerical values herein are merely exemplary and are not intended to limit the present disclosure.
In some embodiments, the geographic information phrase database may include a positive phrase database and a negative phrase database. The probability represented by the label information corresponding to the geographic information phrase in the positive phrase database is larger than the probability represented by the label information corresponding to the geographic information phrase in the negative phrase database. That is, if the target geographic information vocabulary includes the geographic information vocabulary in the positive phrase database, the target geographic information vocabulary has a higher probability of being a sensitive vocabulary, and if the target geographic information vocabulary includes the geographic information vocabulary in the negative phrase database, the target geographic information vocabulary has a lower probability of being a sensitive vocabulary.
The geographic information phrase database may also be updated at a certain frequency, for example, by deleting an existing geographic information phrase in the geographic information phrase database, adding a new geographic information phrase in the geographic information phrase database, and/or modifying tag information of the geographic information phrase.
In some embodiments, the geographic information phrase database further includes version information for representing a current version of the geographic information phrase database. The content of the geographic information phrase databases of different versions can be different, and the updated geographic information phrase databases of each version can be conveniently managed by adding version information.
In some embodiments, the geographic information phrase database further includes a file flag for indicating the validity of the geographic information phrase database. When in use, the file mark in the geographic information phrase database can be read first, and the read file mark is matched with the preset effective file mark. If the matching is successful, the geographical information phrase database is an effective geographical information phrase database, so that the sensitivity judgment on the target geographical information words can be performed based on the geographical information phrase database.
In some embodiments, the geographic information phrase database may also include weights for the geographic information phrase database. The weights employed by different geographic information phrase databases (e.g., positive phrase database and negative phrase database) may be different. By setting the weight for the geographic information phrase database, the influence degree of the label information of the geographic information phrases in the geographic information phrase database on the sensitivity discrimination result can be adjusted.
In some embodiments, the geographic information phrase database may also include weights for individual geographic information phrases. By setting weight (called phrase weight) for the geographic information phrases, the influence degree of the label information corresponding to the geographic information phrases on the sensitivity discrimination result can be adjusted. For example, a geographic information word comprising a geographic information word set of "army" may be more likely to be a sensitive word than a geographic information word set comprising a geographic information word set of "facial museum", and thus the weight of the geographic information word set of "army" may be greater than the weight of the geographic information word set of "facial museum".
In some embodiments, each geographic information phrase in the geographic information phrase database, tag information corresponding to the geographic information phrase, a weight of the geographic information phrase database, and a weight corresponding to the geographic information phrase may be encrypted.
The geographic information phrase database of some embodiments is shown in fig. 3, and includes header information and table data, where the header information includes a file flag, file version information, and weights of the geographic information phrase database, and the table data is used to record the geographic information phrases, tag information of the geographic information phrases, and weights of the geographic information phrases. The weight of the geographic information phrase database, the geographic information phrase, the label information of the geographic information phrase and the weight of the geographic information phrase can be encrypted data. In the example where the geographic information phrase database includes a positive phrase database and a negative phrase database, both the positive phrase database and the negative phrase database may employ the data structure shown in fig. 3.
In step S14, the target geographic information vocabulary may be segmented to obtain a plurality of segmented words. The plurality of word segments may include a geographical information phrase, and may also include a non-geographical information phrase, including but not limited to a name, time, number, etc. For example, the target geographic information vocabulary is "Zhang Sanqing", and the word segmentation of the target geographic information vocabulary can obtain two word segments of "Zhang Sanzhu" and "Pinggu", wherein, "Zhang Sanzhu" is a name of a person and "Pinggu" is a geographic information phrase.
Each of the plurality of tokens may be matched with a geographic information phrase in a geographic information phrase database. In the case that the geographic information phrase database comprises a positive phrase database and a negative phrase database, each word segment can be matched with the geographic information phrase in the positive phrase database, if a word segment fails to be successfully matched with the geographic information phrase in the positive phrase database, then the word segment is matched with the geographic information phrase in the negative phrase database, or each word segment can be matched with the geographic information phrase in the negative phrase database, if a word segment fails to be successfully matched with the geographic information phrase in the negative phrase database, then the word segment is matched with the geographic information phrase in the positive phrase database, or each word segment can be matched with the geographic information phrase in the positive phrase database and the geographic information phrase in the negative phrase database in parallel.
In some embodiments, each geographic information word and its tag information in the geographic information word group database is pre-encrypted. The geographical information phrase in the geographical information phrase database can be decrypted first, and each word segment in the plurality of word segments is matched with the decrypted geographical information phrase in the geographical information phrase database. In addition, the label information of the geographic information word groups successfully matched with the plurality of word segmentation words in the geographic information word group database can be decrypted, and the sensitivity of the target geographic information word group is determined based on the decrypted label information of each geographic information word group successfully matched.
For example, each geographic information phrase in the geographic information phrase database may be decrypted first, and the decrypted geographic information phrase may be matched with a plurality of segmentations included in the target geographic information vocabulary. Assuming that the geographic information word group a in the geographic information word group database is successfully matched with a certain word segmentation word in the target geographic information word group, the label information of the geographic information word group a can be decrypted, and the sensitivity of the target geographic information word group is determined based on the decrypted label information of the geographic information word group a.
It will be appreciated that the above is but one manner of realisation. In other implementations, the method may also include encrypting the plurality of word segments in the target geographic information word group according to an encryption mode of the geographic information word group, matching the encrypted plurality of word segments with the geographic information word group in the geographic information word group database, screening out successfully matched geographic information word groups, decrypting tag information of the successfully matched geographic information word groups, and determining sensitivity of the target geographic information word group based on the decrypted tag information. The method can reduce the leakage of the geographic information phrase and improve the data security.
Through the matching process, the geographical information word groups successfully matched with each word segmentation in the geographical information word group database can be determined, and the sensitivity of the target geographical information word group can be determined based on the determined geographical information word groups. For example, the target geographic information word is segmented to obtain A, B, C, D words, wherein, A and B are successfully matched with the geographic information word group a in the positive word group database, C is not successfully matched with any geographic information word group in the geographic information word group database, D is successfully matched with the geographic information word group D in the negative word group database, and the sensitivity of the target geographic information word can be determined based on the label information of the geographic information word group a and the label information of the geographic information word group D.
In some embodiments, when matching the word segment with the geographic information phrase, if the word segment is the same as a certain geographic information phrase, it may be determined that the word segment is successfully matched with the geographic information phrase. If the word segmentation is different from the geographic information phrase, determining that the word segmentation fails to match with the geographic information phrase.
In other embodiments, the word segmentation and the geographic information phrase can be respectively matched in two dimensions of pronunciation and font, so as to obtain pronunciation similarity and font similarity of the word segmentation and the geographic information phrase, the total similarity between the word segmentation and the geographic information phrase is determined based on the pronunciation similarity and the font similarity, and whether the word segmentation and the geographic information phrase are matched is determined based on the total similarity. For example, when the total similarity is greater than a preset similarity threshold, it may be determined that the match between the word segment and the geographic information phrase is successful, or else, it is determined that the match between the word segment and the geographic information phrase is failed. In this way, the situation that the matching is inaccurate due to misspelling or pronunciation approximation can be reduced, for example, when a user uploads a geographic information vocabulary of 'XX army' through an interface with an open map service, the 'army' is input as a 'part pair', or when the vehicle-mounted voice output module outputs 'Luo Yang', the 'army' is mistakenly identified as 'Luoyang'.
In some embodiments, each word segment of the plurality of word segments may be matched with a geographic information phrase in a positive phrase database, first tag information of each first geographic information phrase successfully matched with the plurality of word segments in the positive phrase database is determined, each word segment of the plurality of word segments is matched with a geographic information phrase in a negative phrase database, second tag information of each second geographic information phrase successfully matched with the plurality of word segments in the negative phrase database is determined, and sensitivity of the target geographic information phrase is determined based on the first tag information of each first geographic information phrase and the second tag information of each second geographic information phrase.
Following the previous example, the first geographic information phrase successfully matched with the plurality of word segments in the positive phrase database includes a geographic information phrase a, and the second geographic information phrase successfully matched with the plurality of word segments in the negative phrase database includes a geographic information phrase d, so that the sensitivity of the target geographic information vocabulary can be determined based on the tag information of the geographic information phrase a and the tag information of the geographic information phrase d.
Further, a first weight corresponding to the positive phrase database and a second weight corresponding to the negative phrase database may be determined, first tag information of each first geographic information phrase is weighted based on the first weight, second tag information of each second geographic information phrase is weighted based on the second weight, and sensitivity of the target geographic information vocabulary is determined based on the weighted first tag information of each first geographic information phrase and the weighted second tag information of each second geographic information phrase.
Following the previous example, the tag information of the geographic information phrase a may be weighted based on a first weight and the tag information of the geographic information phrase d may be weighted based on a second weight. Then, whether the target geographic information vocabulary is a sensitive vocabulary is determined based on the label information weighted by the geographic information phrase a and the label information weighted by the geographic information phrase d.
Further, before determining the sensitivity of the target geographic information word group based on the first tag information of each first geographic information word group and the second tag information of each second geographic information word group, the weight of each first geographic information word group and the weight of each second geographic information word group may also be determined, the first tag information of the corresponding first geographic information word group is weighted based on the weight of each first geographic information word group, and the second tag information of the corresponding second geographic information word group is weighted based on the weight of each second geographic information word group.
In some embodiments, the weight of a geographic information phrase is related to the level of sensitivity of the geographic information phrase. The sensitivity level of the geographic information phrase can be determined first, and then the weight of the geographic information phrase is determined according to the sensitivity level of the geographic information phrase. The higher the sensitivity level of the geographic information phrase is, the higher the probability that the geographic information word comprising the geographic information phrase is a sensitive word is. The weights corresponding to the different sensitivity levels may be different.
It will be appreciated that in practical applications, the number of individual geographic information phrases that are successfully matched with multiple word segments in the target geographic information vocabulary may not be limited to the case illustrated in the above example, but the sensitivity of the target geographic information vocabulary may be determined in a manner similar to that in the above embodiment, regardless of the number of geographic information phrases that are successfully matched.
A specific embodiment of the present application and its application scenario will be illustrated with reference to the accompanying drawings.
The application can be used in the fields of intelligent network automobiles and automatic driving, and can be used for judging the sensitivity of geographic information words collected by the vehicle-mounted sensor and giving out a judging result. The application mainly comprises three parts, namely a knowledge base, judging reference materials and semantic judgment.
The whole system architecture is shown in fig. 4 and comprises a plurality of parts including a compliance Jiang Zheng word stock, a compliance strong positive word stock comparison table, a compliance strong positive semantic judgment module, a compliance strong negative word stock comparison table, a compliance strong negative semantic judgment module, a compliance weak positive word stock comparison table, a compliance weak positive semantic judgment module, a compliance weak negative word stock comparison table and a compliance weak negative semantic judgment module.
The knowledge bases comprise a positive vocabulary database (also called a positive word library with high compliance), a negative vocabulary database (also called a negative word library with high compliance), a positive phrase database (also called a weak positive word library with compliance) and a negative phrase database (also called a weak negative word library with compliance), and the knowledge bases can be continuously accumulated and updated by the actual combat experience of expert teams.
The geographic information vocabulary in the strong positive word stock is a publicizable and non-sensitive geographic information vocabulary (namely, the non-sensitive vocabulary in the embodiment) related to the intelligent network-connected automobile and the automatic driving field;
The geographic information vocabulary in the compliance strong negative word stock is an unpublishable and strong sensitive geographic information vocabulary (namely a sensitive vocabulary in the embodiment) related to the intelligent network-connected automobile and the automatic driving field;
the geographic information word group in the combined weak positive word library is a publicly available word group related to the intelligent network-connected automobile and automatic driving field, but can generate an objection geographic information word group when combined with a weak sensitive word group, so that the label information in the combined weak positive word library is used for representing the probability that the geographic information word comprising the geographic information word group is a non-sensitive word;
The geographic information word group in the combined weak negative word library is a weak sensitive geographic information word group related to the intelligent network-connected automobile and automatic driving field, so that the label information in the combined weak positive word library is used for representing the probability that the geographic information word comprising the geographic information word group is a sensitive word.
The judging reference data comprises two contents of a comparison table and a judging principle. The comparison table comprises a strong-compliance positive word stock comparison table (comprising a plurality of non-sensitive words), a strong-compliance negative word stock comparison table (comprising a plurality of sensitive words), a weak-compliance positive word stock comparison table (comprising a plurality of geographic information words and label information thereof, wherein the label information is used for representing the probability that the geographic information words comprising the geographic information words are the non-sensitive words), and a weak-compliance negative word stock comparison table (comprising a plurality of geographic information words and label information thereof, wherein the label information is used for representing the probability that the geographic information words comprising the geographic information words are the sensitive words).
And generating irreversible digital unique identifiers by adopting a commercial-secret data encryption protection technology for all vocabulary of the positive word stock with strong compliance, and storing the irreversible digital unique identifiers in a private format file, namely the unique identifiers are the comparison table of the positive word stock with strong compliance. Similarly, all vocabulary of the strong negative word stock with the compliance can be generated into irreversible digital unique identifiers by adopting a commercial-secret data encryption protection technology and stored in a file with a private format, namely the strong negative word stock with the compliance is a comparison table.
The method can continuously accumulate and update the combined weak word stock according to expert team actual combat experience, determine the sensitivity level of the geographic information word group according to the degree of possible objection generated by combining the combined weak word stock with the geographic information word group, and assign different weights to the geographic information word groups with different sensitivity levels. For example, N sensitivity levels may be formed by dividing the sensitivity levels into N weights. And the weight of the geographic information phrase can be adjusted by cooperating with the weak and negative word bank after updating the weak and positive word bank every time. And encrypting the weight, the label information, the weight of the combined weak positive word stock and other information of the geographical information word group, and storing the encrypted information in a private format file, namely a combined weak positive word stock comparison table. The specific steps are shown in fig. 5.
The processing manner of the comparison table of the compliance weak negative word stock is similar, and is not repeated here.
The various weights involved in the above embodiments may be determined based on the technical decision rule formed by continuously accumulating and updating expert team actual combat experiences, depending on the compliance weak positive word bank score table and the compliance weak negative word bank score table. Meanwhile, scoring and judging links of the semantic judgment module depend on scoring judgment principles.
As shown in FIG. 6, the semantic judgment mainly comprises the following steps of 1) judging a positive word stock with high compliance, 2) judging a negative word stock with high compliance, 3) judging a weak positive word stock with low compliance, 4) judging a weak negative word stock with low compliance, and 5) outputting a judgment result. The judgment result can determine whether the input geographic information vocabulary can perform activities such as disclosure or transmission.
① High-compliance positive word stock determination
Generating an irreversible digital unique identifier by adopting a commercial-secret data encryption protection technology on an input target geographic information vocabulary, judging whether the digital unique identifier is contained in a positive word stock comparison table with strong compliance, if so, determining that the target geographic information vocabulary is a non-sensitive vocabulary, returning a judging result, ending a judging process, and if not, entering a step ② to continue judging;
② Compliance strong negative word stock determination
Generating an irreversible digital unique identifier by adopting a commercial-secret data encryption protection technology on an input target geographic information vocabulary, judging whether the digital unique identifier is contained in a comparison table of a strong compliance negative word stock, if so, determining that the target geographic information vocabulary is a sensitive vocabulary, returning a judging result, and ending a judging process, and if not, entering a step ③ to continue judging;
③ Compliance weak positive word stock determination
Traversing a classification table of the combined weak positive word stock, carrying out regular matching on an input target geographic information word and each decrypted geographic information word group in the classification table of the combined weak positive word stock to obtain weight and probability (namely label information) of each geographic information word group, and accumulating and calculating the probability corresponding to each successfully matched geographic information word group in the classification table of the combined weak positive word stock to obtain accumulated total score, wherein a dynamic scoring formula is as follows:
Wherein y positive represents the accumulated total score of each successfully matched geographic information phrase in the classification table of the weak positive word bank, alpha 1,α2,…,αN represents the weight of each of N geographic information phrases in the classification table of the weak positive word bank, and x 1,x2,…,xN represents the probability of each corresponding N geographic information phrases in the classification table of the weak positive word bank based on the sensitivity level determination of the corresponding geographic information phrase.
④ Compliance weak negative word stock determination
Traversing a combined weak and negative word stock scoring table, performing regular matching on an input target geographic information word and each decrypted geographic information word group in the combined weak and negative word stock scoring table to obtain the dimension weight and probability (namely label information) of each geographic information word group, and accumulating the probability corresponding to each successfully matched geographic information word group in the combined weak and negative word stock scoring table to obtain an accumulated total score, wherein the dynamic scoring formula is as follows:
Wherein y negative represents the accumulated total score of each geographical information phrase successfully matched in the combined weak negative word stock scoring table, beta 1,β2,…,βM represents the weight of each of M geographical information phrases in the combined weak negative word stock scoring table, and y 1,y2,…,yM represents the probability of each of M geographical information phrases in the combined weak negative word stock scoring table based on the sensitivity level determination of the corresponding geographical information phrase.
⑤ Outputting the determination result
For a certain target geographic information vocabulary, the sensitivity of the target geographic information vocabulary is determined based on the accumulated total score of the compliant weak positive word bank and the score of the compliant weak negative word bank, and the specific formula is as follows:
Wherein, gamma positive and gamma negative respectively represent the weight of the compliance weak positive word bank and the weight of the compliance weak negative word bank, and S total represents the probability that the target geographic information word is a sensitive word.
The weight of the compliant weak positive word stock and the weight of the compliant weak negative word stock are used for ensuring the consistency of the value domain when the accumulated total score is dynamically updated in the knowledge base, and the accumulated total score is respectively from a compliant weak positive word stock scoring table file and a compliant weak negative word stock scoring table file and is dynamically updated along with the synchronization of the two word stocks.
Based on the accumulated total score S total and the built-in decision principle, a unique decision result can be given, where the decision result is used to indicate whether the target geographic information vocabulary can perform activities such as disclosure or transmission. For example, when S total is greater than a certain threshold, it may be determined that the target geographic information vocabulary is a sensitive vocabulary and activities such as disclosure or transmission may not be performed, and when S total is less than or equal to the above threshold, it may be determined that the target geographic information vocabulary is a non-sensitive vocabulary and activities such as disclosure or transmission may be performed.
The application constructs a multidimensional knowledge base based on the actual combat experience accumulated by expert teams for a long time. The knowledge base not only covers wide industry knowledge and experience, but also ensures to keep synchronous with industry development through continuous updating of expert teams, and can cope with the changing hazard identification requirements.
Based on this multidimensional knowledge base, the application further derives decision reference materials and semantic decision logic. These decision references and semantic decision logic combine the expertise and experience within the industry to comprehensively evaluate the input vocabulary from multiple perspectives. By scoring the sensitivity of the vocabulary from multiple dimensions, the potential hazard level of the vocabulary can be more accurately determined.
In order to better promote the accuracy of evaluation, the application designs a strict and scientific semantic judgment module. This module does not simply perform vocabulary matching, but evaluates whether it constitutes a hazard by assigning different weights and probabilities to different geographical information phrases. The multi-step and comprehensive scoring mechanism is adopted, and the sensitivity level of the vocabulary is comprehensively considered, so that a more accurate and reliable judgment result is obtained.
As shown in fig. 7, the present application further provides a device for determining sensitivity of a geographic information vocabulary, where the device includes:
A first obtaining module 101, configured to obtain a target geographic information vocabulary;
a determining module 102, configured to determine whether the target geographic information vocabulary hits a pre-established geographic information vocabulary database, where the geographic information vocabulary database includes a plurality of geographic information vocabularies, each geographic information vocabulary corresponds to a point of interest, the point of interest represents a geographic location with uniqueness and certainty, and the plurality of geographic information vocabularies includes a sensitive vocabulary and a non-sensitive vocabulary;
A second obtaining module 103, configured to obtain a pre-established geographical information phrase database if the target geographical information phrase does not hit the geographical information phrase database, where the geographical information phrase database includes a plurality of geographical information phrases and tag information thereof, and the plurality of geographical information phrases can form a geographical information phrase, and the tag information of the geographical information phrase is used to represent a probability that the geographical information phrase including the geographical information phrase is a sensitive word or a non-sensitive word;
the matching module 104 is configured to segment the target geographic information vocabulary to obtain a plurality of segments, match each segment in the plurality of segments with a geographic information phrase in the geographic information phrase database, and determine the sensitivity of the target geographic information vocabulary based on the tag information of each geographic information phrase successfully matched with the segment in the geographic information phrase database.
The functions or modules included in the apparatus provided by the present application may be used to perform the methods described in the foregoing method embodiments, and specific implementation of the methods may refer to the descriptions in the foregoing method embodiments, which are not repeated herein for brevity.
The embodiment of the application also provides a computer device, which at least comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to implement the method of any of the previous embodiments.
Fig. 8 shows a more specific hardware architecture of a computer device, which may include a processor 201, a memory 202, an input/output interface 203, a communication interface 204, and a bus 205, according to an embodiment of the present application. Wherein the processor 201, the memory 202, the input/output interface 203 and the communication interface 204 are communicatively coupled to each other within the device via a bus 205.
The processor 201 may be implemented by a general purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application. The processor 201 may also include a graphics card, which may be NVIDIA TITAN X graphics card, 1080Ti graphics card, or the like.
The Memory 202 may be implemented in the form of Read Only Memory (ROM), random access Memory (Random Access Memory, RAM), static storage devices, dynamic storage devices, and the like. Memory 202 may store an operating system and other application programs, and when implementing the techniques provided by embodiments of the present application by software or firmware, the associated program code is stored in memory 202 and invoked for execution by processor 201.
The input/output interface 203 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The communication interface 204 is used to connect with a communication module (not shown in the figure) to enable communication interaction between the present device and other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 205 includes a path to transfer information between components of the device (e.g., processor 201, memory 202, input/output interface 203, and communication interface 204).
It should be noted that, although the above device only shows the processor 201, the memory 202, the input/output interface 203, the communication interface 204, and the bus 205, in the implementation, the device may further include other components necessary for achieving normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary for implementing the embodiments of the present application, and not all the components shown in the drawings.
Embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present application.
The embodiment of the present application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to any of the previous embodiments.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computer device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing embodiments of the present application. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
The foregoing is merely illustrative of the principles of this application, and it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of this application.