CN103678610A

CN103678610A - Method for recognizing object based on intelligent mobile phone sensor

Info

Publication number: CN103678610A
Application number: CN201310690339.4A
Authority: CN
Inventors: 寿黎但; 陈珂; 陈刚; 胡天磊; 彭湃
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-12-16
Filing date: 2013-12-16
Publication date: 2014-03-26

Abstract

The invention discloses an object recognition method based on a smart phone sensor. The invention makes full use of the rich sensor parameters of the smart phone, including GPS positioning, camera, camera parameters, etc., and proposes a probabilistic FOV model based on geographic space position and related The pruning strategy and similarity measure method based on visual space. By means of multi-modal combination, the method of the present invention can correctly identify the object queried by the user.

Description

A kind of object identification method based on smart mobile phone sensor

Technical field

The present invention relates to spatial data index, the sparse coding in image recognition and retrieval and signal process field, relates in particular to a kind of object identification method based on smart mobile phone sensor.

Background technology

Spatial data refers to the data for the position of representation space entity, shape, size and all multi-aspect informations of distribution characteristics, and spatial data is mainly used in Geographic Information System (GIS) field at present.In the use of spatial data, need the space arithmetic operations such as a large amount of inquiries, insertion, deletion, therefore spatial data index seems most important efficiently.R tree is a kind of space index structure being most widely used at present, and R tree is that B sets the expansion in hyperspace, is a kind of tree construction of balance.R tree construction adopts the minimum border rectangle that is parallel to data space axle to be similar to complicated spatial object, and its major advantage is to represent a complicated object by the byte of some.Even now can be lost a lot of information, but the minimum boundary rectangle of space object has retained the most important geometrical property of object, i.e. the position of space object and its scope in whole coordinate axis.

At image recognition and searching field, vision word (Bag-of-visual-words) be a kind of the most frequently used be also the method that effect is outstanding.The method is inspired by word bag (Bag-of-words) model in text retrieval field, and in off-line phase, by the local feature point cluster of a large amount of pictures, these cluster centres are vision word.For new picture, only the local feature detecting need to be projected to these goes above vision word obtaining in advance, just a sub-picture is represented to become the proper vector of a vision word, the dimension of vector is exactly the size of vision word, the implication of the value in each dimension is in this image, the frequency that this vision word occurs.Once by image vector, the calculating of the similarity in vector model so (as cosine distance) just can be suitable for.By using k-Nearest Neighbor Classifier or support vector machine to learn existing training sample, just can identify new image category.

Sparse coding originates from Neuscience the earliest, and neurophysiologist has launched comprehensive deep research on vision system, and has obtained some significant achievements in research. and this just makes to utilize computing machine to come analog vision system to become possibility in engineering.Based on this understanding, utilize existing biology scientific payoffs, contact signal processing, the theory of computation and information theory knowledge, by vision system is carried out to microcomputer modelling, make to calculate the vision system that function is simulated people to a certain extent, a difficult problem of encountering to solve artificial intelligence in image processing field.The basic assumption of sparse coding is to have a series of base vector (basis vector), so for input signal (vector) arbitrarily, its expression can both be become to the linear combination of these base vectors, wherein, it is non-zero that the coefficient of these linear combinations only has a few items, and these coefficients are " sparse coding ".

Summary of the invention

The object of the invention is to for the deficiencies in the prior art, a kind of object identification method based on smart mobile phone sensor is provided.

The technical scheme that the present invention solves its technical matters employing is as follows: a kind of object identification method based on smart mobile phone sensor, it is characterized in that, and the step of the method is as follows:

(1) user submits an object query demand on smart mobile phone, comprises gps coordinate, camera FOV parameter, the pictorial information that camera captures;

(2) by gps coordinate, initiate spatial index R tree query, obtain the picture set of spatial neighbors;

(3) in above-mentioned picture set, gather gps coordinate and camera FOV parameter is set up probability FOV model, this model has been considered the uncertainty of GPS location and camera parameter, can estimate the probability that certain object is captured by camera; Described probability FOV model is;

\underset{Q}{&Integral;} e^{- \frac{{| | cθ | |}^{2}}{2 {σ_{1}}^{2}}} \cdot e^{- \frac{{| | d | |}^{2}}{2 {σ_{2}}^{2}}} dq;

In formula, Q is the probabilistic border circular areas of above-mentioned GPS, and q is the unit area in Q, and d and θ are respectively object and surface distance and the deviation angle of this object from camera, σ ₁with σ ₂respectively empirical constant, c=1;

(4) by the foundation of probability FOV model, the Pruning strategy based on this model is further proposed, described Pruning strategy is: for an inquiry, the object outside probability FOV example collection corresponding to inquiry is candidate's object scarcely; The relevant picture of those objects that may be caught in hardly from geographical geometric space is by deleted, and then the scale quantity of having dwindled candidate's pictures;

(5) calculate the visual signature similarity of inquiry picture and candidate's pictures, this measuring similarity space is the angle from signal reconstruction, solves the sparse coding of inquiry picture on candidate's pictures;

(6) calculate the vision similarity of inquiry picture and object, for the picture analogies degree of same object, superpose, and then normalization has just obtained the similarity value with object;

(7), by the method integrating step 5 and 6 of ballot, obtain the comprehensive evaluation mark of candidate's object;

(8) to candidate's object comprehensive evaluation mark sequence in step 7, the object that mark is the highest is net result.

The beneficial effect that the present invention has is: by internet (picture sharing website etc.) upper a large amount of photos are set up to simple feature extraction and geospatial location index, the inquiry (comprising gps coordinate, camera FOV parameter and picture) that can submit at smart mobile phone end according to user identifies captured object.Probability FOV model based on geospatial location and relevant Pruning strategy and the method for measuring similarity based on visual space can make recognizer efficient and accurate.

Accompanying drawing explanation

Fig. 1 is the invention process flow chart of steps.

Embodiment

Now in conjunction with concrete enforcement and example, technical scheme of the present invention is described further.

As Fig. 1, specific implementation process of the present invention and principle of work are as follows:

Step 1: user initiates an inquiry at smart mobile phone end APP, specifically, user has taken a photo facing to the outward appearance of certain object, and APP records gps coordinate now simultaneously, camera FOV parameter (comprise camera towards etc.) and the picture of taking;

Step 2: in off-line phase, spatial data index (as R tree) is set up in picture geographic position in database, every pictures has extracted visual feature vector by vision word model, every pictures correspondence geographical space coordinate and visual feature vector, and relevant object label;

For the online inquiry of submitting to of user, the gps coordinate of submission is submitted to and in existing spatial data index, carries out a site polling, return and obtain inquiring about near the picture set of coordinate points, these pictures synthesize candidate's pictures, and the collection of objects that picture is corresponding becomes candidate's object collection;

Step 3: the Pruning strategy of probability FOV model will further dwindle the scale of above-mentioned candidate's collection of objects is deleted the object that may be captured by camera hardly in those spaces, geographic position from set, and so corresponding pictures are also by deleted.

Traditional FOV model has comprised 4 parameters, the position of camera, camera towards, the maximum visual distance of the subtended angle of camera and camera, these parameters can both be easy to get on the sensor of smart mobile phone, and be projected on two dimensional surface be one fan-shaped.Yet, due to reasons such as device measurings, the coordinate of GPS location is as a rule very inaccurate, and in general the error of civilian GPS positioning equipment is 50 meters of left and right, the gps coordinate of measuring so has just had uncertainty, and traditional FOV model is not considered this uncertainty.

In probability FOV model, introduce a variable r and controlled the uncertainty that GPS locates, this range of indeterminacy is one and take gps coordinate measured value as the center of circle, the circle that r is radius, show actual gps coordinate value may be in this border circular areas any one position.In addition, the probability distribution that the some objects of probability FOV model assumption are captured by camera is relevant from surface distance and the deviation angle of camera with this object, and be Gaussian distribution, thereby the i.e. integration of probability distribution function on border circular areas for this reason of the probability that captured by camera in this uncertain region of this object, this value is the possibility size that camera is caught in geographical geometric space:

\underset{Q}{&Integral;} e^{- \frac{{| | cθ | |}^{2}}{2 {σ_{1}}^{2}}} \cdot e^{- \frac{{| | d | |}^{2}}{2 {σ_{2}}^{2}}} dq;

In formula, Q is the probabilistic border circular areas of above-mentioned GPS, and q is the unit area in Q, and d and θ are respectively object and surface distance and the deviation angle of this object from camera, σ ₁with σ ₂respectively empirical constant, due to σ ₁variation in fact can affect θ, omit so c and also lose generally, get c=1 here.

Step 4: introduce the Pruning strategy based on above-mentioned probability FOV model below.According to the uncertain phenomenon of GPS mentioned above, can see for any one object and appearing in certain inquiry, necessarily exist some probability FOV examples (fan-shaped) to comprise this object.In other words, all possible probability FOV example collection must cover all possible candidate's object.Therefore,, for an inquiry, the object outside probability FOV example collection corresponding to inquiry is candidate's object scarcely.The relevant picture of object that the proposition of Pruning strategy may be caught in hardly those from geographical geometric space is by deleted, and then the scale quantity of having dwindled candidate's pictures, for the calculating of follow-up visual signature has reduced unnecessary calculation cost, improved the performance of total system.

Step 5: calculate the vision similarity of all objects in inquiry picture and above-mentioned steps, mean from visual angle, the inquiry picture that user submits to is the most similar with which object.

Traditional picture analogies degree is all the Similarity Measures in vector space model, as cosine distance, Euclidean distance etc., yet the problem of this class methods maximum is the value of similarity between picture and picture and do not have discrimination, this is that the proper vector due to picture is that higher-dimension and sparse characteristic cause.In order to overcome this problem, the Similarity Measures based on sparse coding in signal process field has been proposed.

The basic thought of sparse coding is, some base vectors in given signal space, also claim former subvector, arbitrary input can be represented to become the linear combination of these base vectors, the coefficient of these linear combinations is coding, wherein only having a few coefficients is nonzero term, and most of coefficients are zero, namely so-called " sparse ".Generally, base vector need to have a large amount of sample trainings to obtain, in the method, can regard the candidate's picture set in step (3) as base vector, problem just transforms into so: the proper vector of more given pictures, can for a new input picture feature vector, by the linear combination of these pictures, reconstruct this input picture? and the basic problem of Here it is sparse coding, the coefficient solving is the similarity of input picture and this picture.The angle of processing from signal, can be understood as, in order to reconstruct input picture, the contribution degree that this picture has been done.

Step 6: by the vision similarity that after the similarity stack of same object picture, also normalization has just obtained inquiring about picture and this object, this metric has been weighed from image content itself and which object more approaching.

Step 7: for step (3) and the resulting probability being captured by camera from certain object of geographical space angle of step (4) with from the similarity value of visual signature angle inquiry picture and this object, the two weighting has just been obtained to the comprehensive evaluation mark of certain object.Introducing weight variable λ can control evaluation score and more tend to geographical space aspect or visual signature aspect, and this variable is called balance factor.

In the present invention, balance factor is an adjustable parameter, if balance factor is zero, can think that recognizer only used geographical geometric space information and ignored the information of visual signature, if 1 can be thought and only used visual signature information and ignored geospatial information.Apparently, in the relatively sparse region of object, the probability FOV model of pure geographical geometric space can larger area separate the probability that different objects is captured by camera, thereby now can see calculation cost little without the similarity in computation vision space.On the contrary, in the relatively dense area of object, probability FOV model just no longer has obvious discrimination, need to improve the similarity weight of visual signature and cover the shortage.

Step 8: obtained the comprehensive evaluation mark of each object in candidate's collection of objects in step (5), after this mark sequence, maximal value is the object being captured by user's camera, thereby has completed the work of automatic identification.

Claims

1. an object recognition method based on smart phone sensor, it is characterized in that, the steps of the method are as follows:

(1) The user submits an object query request on the smartphone, including GPS coordinates, camera FOV parameters, and image information captured by the camera.

(2) Initiate a spatial index R-tree query through GPS coordinates to obtain a collection of pictures of spatial neighbors;

(3) Set up the GPS coordinates and camera FOV parameters in the above picture collection to establish a probabilistic FOV model, which takes into account the uncertainty of GPS positioning and camera parameters, and can estimate the probability of an object being captured by the camera; the probability The FOV model is .

\underset{Q Q}{&Integral; &Integral;} {e e}^{- - \frac{{| | | | cθ cθ | | | |}^{22}}{22 {σ σ}_{11}^{22}}} \cdot \cdot {e e}^{- - \frac{{| | | | d d | | | |}^{22}}{22 {σ σ}_{22}^{22}}} dq dq;;

In the formula, Q is the circular area of the above-mentioned GPS uncertainty, q is the unit area in Q, d and θ are the surface distance and the offset angle of an object and the object from the camera, respectively, σ ₁ and σ ₂ are Empirical constant, c=1.

(4) Through the establishment of the probabilistic FOV model, a pruning strategy based on the model is further proposed. The pruning strategy is: for a query, objects outside the set of probabilistic FOV instances corresponding to the query must not be candidate objects; Images related to objects that are nearly impossible to capture in geogeometric space are removed, thereby reducing the size of the candidate image set.

(5) Calculate the visual feature similarity between the query picture and the candidate picture set. The similarity measurement space is from the perspective of signal reconstruction to solve the sparse coding of the query picture on the candidate picture set.

(6) Calculate the visual similarity between the query image and the object, superimpose the image similarity of the same object, and then normalize to obtain the similarity measure with the object.

(7) Combining steps 5 and 6 through the voting method to obtain the comprehensive evaluation score of the candidate object.

(8) Rank the comprehensive evaluation scores of the candidate objects in step 7, and the object with the highest score is the final result.