Disclosure of Invention
The invention aims to innovatively use geographical location data of indoor independent subspaces in an indoor space and fresh social multimedia data on the Internet to quantitatively research the value of the indoor space aiming at the defects of the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the method depends on the support of spatial data and social multimedia data, applies methods such as an indoor space model, a reachability theory and emotion analysis, and obtains semantic values of indoor independent sub-spaces through accurate and rapid calculation from two aspects of technology and data.
The method comprises the following steps: firstly, dividing the semantic value of the indoor independent subspace into two sub-problems of a zone bit contribution degree and a social contribution degree, respectively calculating the zone bit contribution degree and the social contribution degree of the indoor independent subspace, and then fusing the zone bit contribution degree and the social contribution degree of the indoor independent subspace to obtain the semantic value of the indoor independent subspace.
The position contribution degree of the indoor independent subspace is calculated by adopting the following method:
A) analyzing topological connection relations among the indoor independent sub-spaces according to the two-dimensional data of the indoor space by using a generation algorithm, and constructing an indoor space model;
B) calculating the distance between objects in the indoor space model by using a distance measurement method, and meanwhile, calculating the selection probability of each walking path of the user in the indoor space according to the walking constraint of the user to construct a path matrix;
C) and calculating the zone bit contribution degree of the indoor independent subspace by utilizing the path matrixes of different floors.
The social contribution degree of the indoor independent subspace is calculated in the following mode:
a) analyzing sentence structures of social evaluation texts and extracting keywords for weighting according to social multimedia data of indoor independent subspaces on the Internet to construct an emotion classification base classifier;
b) a key sentence extraction algorithm is adopted, a sentence set of the social evaluation text is divided into key sentences and non-key sentences, and the social evaluation text is classified by combining a classifier fusion method;
c) and calculating the social contribution degree of each indoor independent subspace according to the classification of the social evaluation texts of the indoor independent subspaces.
The indoor independent subspace location contribution degree is calculated by the indoor independent subspace location contribution degree calculating module in the following mode, and the indoor independent subspace location contribution degree calculating module comprises an SVG map obtaining module, an indoor space modeling module and a location calculating module:
step 1.1, acquiring an indoor map of an indoor space in an SVG format by an SVG map acquisition module in a tracing mode, and analyzing the indoor map into a set consisting of various geometric elements;
and splitting the indoor maps of multiple floors into indoor maps of each single floor for processing.
Step 1.2, in an indoor space modeling module, analyzing a set formed by various geometric elements through a generation algorithm to obtain topological connection relations among the geometric elements, wherein the topological relations comprise polygonal elements, communication doors of the polygonal elements and communication relations among the polygonal elements;
step 1.3, calculating and obtaining the position contribution degree of the indoor independent subspace through a position calculation module
Step 1.3.1: adding user walking constraints on the basis of topological connection relations among all geometric elements, and constructing and obtaining an indoor space distance model with constraints;
step 1.3.2: with the two data structures D2R and R2D as hybrid indices, the hybrid indices are used to compute the distances dis of sub-paths between different polygon elementsi;
Step 1.3.3: calculating the selection probability p of each different walking path from the outside of the indoor space to the indoor independent subspacekThe walking path is formed by sub-paths among different polygon elements;
step 1.3.4: for each indoor independent subspace, combining the path lengths and the selection probabilities of different walking paths, establishing a data structure of a path matrix:
for the ith indoor independent subspace of the jth floor, the following path matrix is established:
wherein: p is a radical ofkRepresenting the probability, dis, of the k-th travel path that may be selected to travel from different exterior doorways of the indoor space to the indoor independent subspacekRepresenting the path length of the k-th walking path which can be selected from the walking from the different external entrances and exits of the indoor space to the indoor independent subspace, wherein k is 1-n, k represents the ordinal number of the path, and n represents the total number of the paths;
step 1.3.5: considering a user as a shared resource for all indoor independent subspaces, the size of a scaled resource can be represented by a path selection probability p based on user behavior, while the distance between the resource and the target point is represented by a path distance dis calculated by a graphical model. Calculating the zone bit contribution degree of the indoor independent subspace by using the path matrix of all possible walking paths reaching the indoor independent subspace, and specifically calculating the following formula:
wherein, Value (store)
i) Indicating the location contribution, value (store) of the i-th indoor independent subspace
jk) Indicating the position contribution degree of the k-th walking path passing through the j-th floor; β represents the magnitude of the attenuation coefficient and is typically set to 1,1.2, 1.5;
indicating the intensity of the attenuation with distance.
The larger the calculation result value is, the stronger the accessibility of the indoor independent subspace is, and the higher the position contribution degree is.
The user walking constraint in step 1.3 refers to a movement constraint of the user from the corridor to the destination indoor independent subspace within the indoor space of the indoor space.
The position contribution degree of the indoor independent subspace is calculated by the indoor independent subspace social contribution degree calculating module in the following mode, wherein the indoor independent subspace social contribution degree calculating module comprises an internet crawler module, a text processing module and an emotion analysis module:
social multimedia data includes news data, social evaluation text data, marketing data, and the like.
2.1, acquiring a social media data set of each indoor independent subspace under the Internet by adopting an Internet crawler module;
2.2, the social media data set contains a social evaluation text, then, a text processing module is used for carrying out de-duplication of the social evaluation text and text vectorization of the social evaluation text, and dependency syntax is used for analyzing sentence structures in the social evaluation text and carrying out part-of-speech tagging;
2.3, evaluating and calculating social contribution degree by adopting emotion analysis module
Step 2.3.1: calculating a TF-IDF characteristic value of each word in each social evaluation text by using a word frequency inverse document frequency TF-IDF algorithm;
step 2.3.2: an emotional word weighting algorithm based on dependency syntactic analysis is proposed. Extracting a designated relationship pair from a sentence structure of the social evaluation text, and if the designated relationship pair is extracted, multiplying the TF-IDF characteristic value of the key emotional word by an enhancement parameter b to enhance; if the appointed relation pair is not extracted, the TF-IDF characteristic value of the key emotional word is not processed;
step 2.3.3: a keyword extraction algorithm suitable for Chinese text is proposed. And dividing all sentences in each social evaluation text into independent and non-coincident parts according to the keyword attribute, the emotional word attribute, the punctuation mark attribute and the position attribute of the sentences. The social rating text d is divided into a series of sentence components: d ═ s1,s2,…,smWhere m represents the number of sentences, siRepresenting the ith sentence;
each sentence siDividing the sentence into a series of word components, and calculating the evaluation value of each sentence by adopting the following formula:
wherein, f(s)
i) Weight of attribute, λ, representing the ith sentence
1Representing a keyword attribute weight, λ
2Representing emotional word attribute weight, λ
3Representing punctuation mark attribute weights, λ
4Representing the location attribute weight of the sentence in the social rating text,
an evaluation value representing an attribute of the keyword,
comments indicating the attributes of emotional wordsThe value of the method is as follows,
an evaluation value representing the attribute of the punctuation mark,
an evaluation value indicating a position attribute;
finally, a sentence with the highest evaluation value in the social evaluation text is taken as an emotion key sentence of the social evaluation text;
step 2.3.4: obtaining emotion key sentence classifier f through training of all labeled social evaluation text samples1Non-emotion key sentence classifier f2And full text classifier f3And labeling the to-be-detected object of the unlabeled social evaluation text corresponding to the indoor independent subspace by adopting a classifier fusion processing mode, wherein the labeling refers to labeling the social evaluation text with emotional tendency which is divided into two types of positive emotional tendency and negative emotional tendency, and specifically comprises the following steps:
training emotion key sentence classifier f by using emotion key sentences in labeled social evaluation text1Using the trained emotion key sentence classifier f1Processing the social evaluation text which is not labeled to obtain a first group of probability values of the social evaluation text on emotional tendency;
training a non-emotion key sentence classifier f using sentences other than emotion key sentences in labeled social evaluation text2Using trained non-emotion key sentence classifier f2Processing the social evaluation texts which are not labeled to obtain a second group of probability values of the social evaluation texts on emotional tendency;
training full-text classifier f using ensemble of labeled social rating texts3Using trained full-text classifier f3Processing the social evaluation texts which are not labeled to obtain a third group of probability values of the social evaluation texts on emotional tendency;
and finally, selecting the maximum probability value of the positive emotional tendency and the maximum probability value of the negative emotional tendency from the three groups of probability values, and comparing the sizes of the probability values: if the maximum probability value of the positive emotional tendency is greater than the maximum probability value equal to the negative emotional tendency, the social evaluation text is marked as good evaluation; if the maximum probability value of the positive emotional tendency is smaller than the maximum probability value of the negative emotional tendency, the social evaluation text is marked as poor evaluation;
step 2.3.5: calculating the social contribution degree S of the indoor independent subspace by adopting the following formula:
N=u+v
p=u/n
wherein S represents the social contribution degree of the indoor independent subspace, u represents the total number of good comments in all social evaluation texts, v represents the total number of bad comments in all social evaluation texts, N represents the total number of all social evaluation texts, p represents the good comment rate, and z represents the social contribution degree of the indoor independent subspaceαIs the quantile of a normal distribution.
The invention has the beneficial effects that:
the method applies methods such as space modeling, reachability theory, emotion analysis and the like, and starts from two aspects of technology and data, and a set of evaluation algorithm and model for calculating indoor independent subspace semantic value are constructed;
the automatic construction of the shopping center indoor space model with customer walking constraint is realized, and the calculation of the distance model is optimized by a mixed index method in the calculation process;
according to the emotion classification method, the emotion classification accuracy and the recall rate are improved through the dependency syntax analysis-based emotion word weighting classifier construction method and the semi-supervised learning-based key sentence extraction and classifier fusion algorithm.
Detailed Description
The technical solution of the present invention will now be further explained with reference to specific embodiments and examples.
Referring to fig. 1, the embodiment of the present invention and its implementation are as follows:
in order to evaluate and predict different store values in a shopping center, the problem can be decomposed into two parts, one part is calculation of the store location contribution degree based on geographical accessibility, the other part is calculation of the store social contribution degree based on brand effect, and then the calculation results of the two parts are fused to obtain the comprehensive location contribution degree of the store.
Step 1: in the store location contribution degree calculation module, inputting an original plane map provided by a shopping center, and outputting the location contribution degree of each store, specifically:
the shop location contribution degree calculation module comprises an SVG map acquisition module, an indoor space modeling module and a location calculation module.
1.1, acquiring an indoor map of a shopping center in an SVG format by an SVG map acquisition module in a tracing mode, and analyzing the indoor map into a set consisting of various geometric elements;
the SVG map acquisition module of specific implementation can adopt the cool happy space map editor.
1.2, in an indoor space modeling module, analyzing a set formed by various geometric elements through a generation algorithm based on Rtree to obtain a topological connection relation among the geometric elements, wherein the topological connection relation comprises a polygonal element, a communication door of the polygonal element and a communication relation among the polygonal elements;
in a specific implementation, the polygon elements include shops, corridors, facilities, escalators, atrium and the like, and the communication relationship between the polygon elements is, for example, the communication relationship between the shops and the corridors.
1.3 zone bit computing Module
As shown in fig. 2, the specific steps are as follows:
step 1.3.1: adding customer walking constraints on the basis of topological connection relations among all geometric elements, and constructing and obtaining an indoor space distance model with constraints;
the customer walking restriction refers to the movement restriction of the customer from the corridor to the target shop in the indoor space of the shopping center.
Step 1.3.2: with the two data structures D2R and R2D as hybrid indices, the hybrid indices are used to compute the distances dis of sub-paths between different polygon elementsi;
In a specific embodiment, the sub-paths are, for example, from the main communication door to the escalator, from the escalator to the store, from the main communication door to the store, and the like.
Step 1.3.3: calculating the selection probability p of each different walking path of the customer from the outside of the shopping center to the shopkThe walking path is formed by sub-paths among different polygon elements;
for example, the probability for a customer's total path from outside the mall to two floors is calculated as: and the probability of selecting to enter the gate of the shopping center is the probability of selecting to enter the escalator after entering the gate of the shopping center, and the probability of selecting to go upwards after entering the escalator.
Step 1.3.4: for each shop, combining the path lengths and the selection probabilities of different walking paths, establishing a data structure of a path matrix:
for the ith store on the jth floor, the following path matrix is established:
wherein: p is a radical ofkRepresenting the probability, dis, of walking from different exterior entrances to the kth walking path that the store may choosekA path length representing a k-th walking path which can be selected by walking from an external entrance of a shopping center to a shop, wherein k is 1 to n, k represents the ordinal number of the path, and n represents the total number of the paths;
(1) number of rows n of matrix: based on different exterior entrances and exits, a total of n travel paths may arrive at the store, where the size of n is related to floor j. The path matrix of the stores on the same floor has the same number n of rows. The number n of rows of the matrix corresponds to the selectable strength of the location influence factors.
(2) First column p of the matrixk:pkRepresenting the probability of walking from different exterior doorways of the shopping mall to the k-th walking path that the store may choose. In which shops located on the same floor j have the same first column p of the matrixk. Matrix element pkRepresenting the size of the target population in the location impact factor.
(3) Second column dis of the matrixk:diskRepresenting the path length of the k-th travel path that may be selected to travel from different exterior doorways of the shopping mall to the store. Where different shops i, located on the same floor j, have different path distances. Matrix element diskRepresenting the strength of reachability among location influencers.
Step 1.3.5: calculating the position contribution degree of the shop by using the path matrix of all possible walking paths reaching the shop, and specifically calculating the following formula:
wherein, Value (store)
i) Indicates the location contribution, Value (store) of the ith store
jk) The zone bit contribution degree of the k-th walking path passing through the jth floor is shown, and beta represents the magnitude of the attenuation coefficient and is generally set to be 1,1.2 and 1.5;
indicating the intensity of the attenuation with distance.
Step 2: in the store social contribution degree calculation module, the social multimedia data of all stores in the same shopping center on the internet is input, and the social contribution degree evaluation and ranking of all stores are output.
The shop social contribution degree calculation module comprises an internet crawler module, a text processing module and an emotion analysis module.
2.1, acquiring a social media data set of each shop under an internet portal website by adopting an internet crawler module as a problem input;
in a specific implementation, the social media data set employs a collection of user comments.
2.2, the social media data set contains user comments, then a text processing module is used for carrying out duplication elimination on the user comments and text vectorization on the user comments, and dependency syntax is used for analyzing sentence structures in the user comments and carrying out part-of-speech tagging;
2.3, an emotion analysis module is adopted to evaluate and calculate the social contribution degree, a supervision algorithm technology and a semi-supervision algorithm technology are adopted in the emotion analysis module, and a specific algorithm flow is shown in fig. 3.
Step 2.3.1: calculating the TF-IDF characteristic value of each word in each user comment by using a word frequency inverse document frequency TF-IDF algorithm;
step 2.3.2: an emotional word weighting algorithm based on dependency syntactic analysis is proposed. Extracting a specified relation pair from a sentence structure of the user comment, and if the specified relation pair is extracted, multiplying the TF-IDF characteristic value of the key emotional word by an enhancement parameter b to enhance; if the appointed relation pair is not extracted, the TF-IDF characteristic value of the key emotional word is not processed;
in a specific implementation, the specified relationship pair is extracted by adopting a dependency relationship-based emotion word extraction rule table which is specifically shown as follows:
TABLE 1 Emotion term extraction rule Table based on dependency relationship
In the above table, & denotes the relation of sum/simultaneous relation, adj denotes an adjective, n denotes a noun, adv denotes an adverb, and vt denotes a verb.
For example, the piece of clothing looks good, extract "(clothing, look)", belongs to the first category in the rules of the table above; the vegetables are very delicious and are strongly recommended to be extracted (dish, recommendation), and belong to the fifth category in the rules of the table "
Step 2.3.3: a key sentence extraction algorithm suitable for Chinese text is proposed.
And dividing all sentences in each user comment into independent and non-coincident parts according to the keyword attribute, the emotional word attribute, the punctuation mark attribute and the position attribute of the sentences.
Dividing the user comment d into a series of sentence components: d ═ s1,s2,…,smWhere m represents the number of sentences, siRepresenting the ith sentence;
each sentence siDividing the sentence into a series of word components, and calculating the evaluation value of each sentence by adopting the following formula:
wherein, f(s)
i) Weight of attribute, λ, representing the ith sentence
1Representing a keyword attribute weight, λ
2Representing emotional word attribute weight, λ
3Representing punctuation mark attribute weights, λ
4Representing the location attribute weight of the sentence in the user comment,
an evaluation value representing an attribute of the keyword,
an evaluation value representing the attribute of an emotional word,
an evaluation value representing the attribute of the punctuation mark,
an evaluation value indicating a position attribute;
the above keywords are extracted using the following table:
TABLE 2
Extracting the emotional words by using a hawnet general word library, and adopting punctuation marks belonging to the following range {? | a! And extracting- }, wherein the position attribute is extracted in a manner of weighting the first sentence and the last sentence of each comment. The individual attribute weights are obtained by maximizing the classification accuracy.
For the evaluation value of each attribute, if there are several such attributes in a sentence, the number of such attributes is given as the evaluation value of the attribute.
Finally, taking the sentence with the highest evaluation value in the user comments as an emotion key sentence of the user comments;
step 2.3.4: obtaining emotion key sentence classifier f through training of all marked user comments1Non-emotion key sentence classifier f2And full text classifier f3And labeling the unlabeled user comments aiming at the shops by adopting a classifier fusion processing mode, wherein the labeling refers to labeling emotional tendency on the user comments, and the emotional tendency is divided into two types of positive emotional tendency and negative emotional tendency, specifically:
training emotion key sentence classifier f by using emotion key sentences in labeled user comments1Using the trained emotion key sentence classifier f1Processing the user comments which are not marked to obtain a first group of probability values of the user comments on the emotional tendency;
training non-emotion key sentence classifier f using sentences other than emotion key sentences in labeled user comments2Using trained non-emotion key sentence classifier f2Processing the user comments which are not marked to obtain a second group of probability values of the user comments on the emotional tendency;
use pairTraining full-text classifier f by using marked user comment entirety3Using trained full-text classifier f3Processing the user comments which are not marked to obtain a third group of probability values of the user comments on the emotional tendency;
in a specific implementation, each group of probability values comprises a probability value of positive emotional tendency and a probability value of negative emotional tendency, and the classifier adopts an LR classifier.
And finally, selecting the maximum probability value of the positive emotional tendency and the maximum probability value of the negative emotional tendency from the three groups of probability values, and comparing the sizes of the probability values: if the maximum probability value of the positive emotional tendency is greater than the maximum probability value equal to the negative emotional tendency, marking the comment of the user as good comment; if the maximum probability value of the positive emotional tendency is smaller than the maximum probability value of the negative emotional tendency, marking the comment of the user as poor comment;
the specific fusion method assumes that the class of each unlabeled sample is determined by the sub-classifier with the highest confidence, and the above method avoids the adverse effect of noise features in classification decision.
Step 2.3.5: social contribution to the store is calculated and ranked based on the number of reviews and the goodness of review. The social contribution S of the store is calculated using the following formula:
N=u+v
p=u/n
wherein S represents the social contribution of the store, u represents the total number of good reviews in all user reviews, v represents the total number of bad reviews in all user reviews, N represents the total number of all user reviews, p represents the good review rate, z represents the social contribution of the store, andαis the quantile of a normal distribution, in general, zαThe value is only required to be 2.
And step 3: after the two subproblems of the location contribution degree and the social contribution degree of the shop are solved, the two calculation results can be combined, and the combined location contribution degree of the shop can be calculated and evaluated by a fusion method to serve as the output of the whole research problem.
Implementations may use linear weighting for fusion.