[go: up one dir, main page]

CN110175258B - Mobile perception data query method based on position sensitive hash index - Google Patents

Mobile perception data query method based on position sensitive hash index Download PDF

Info

Publication number
CN110175258B
CN110175258B CN201910324550.1A CN201910324550A CN110175258B CN 110175258 B CN110175258 B CN 110175258B CN 201910324550 A CN201910324550 A CN 201910324550A CN 110175258 B CN110175258 B CN 110175258B
Authority
CN
China
Prior art keywords
hash
data
index
query
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910324550.1A
Other languages
Chinese (zh)
Other versions
CN110175258A (en
Inventor
季长清
汪祖民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201910324550.1A priority Critical patent/CN110175258B/en
Publication of CN110175258A publication Critical patent/CN110175258A/en
Application granted granted Critical
Publication of CN110175258B publication Critical patent/CN110175258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a mobile perception data query method based on a position sensitive hash index, belongs to the field of big data and mobile application, aims to reduce high-dimensional index cost, regards objects in a high-dimensional space as space data points with position information, maps all object points in the space into m hash tables through a family of hash functions, gives a query point q, and calculates a result value of each q point in the hash function respectively.

Description

Mobile perception data query method based on position sensitive hash index
The application is a divisional application of the invention patent application with the application number of 201610083052.9 and the application date of 2016-02-05, and the invention creates a large-scale image query system based on an inverted position sensitive hash index in a mobile environment.
Technical Field
The invention belongs to the application field of large-scale space-time data processing and mobile technology, and relates to a large-scale image query system based on inverted position-sensitive hash index in a mobile environment
Background
At present, the network basically covers the life of people, and the mobile phone surfing becomes a main surfing mode. By 2014 and 6 months, in the Internet surfing equipment of the national citizens, the mobile phone utilization rate reaches 83.4 percent, the mobile phone is superior to the integral utilization rate of 80.9 percent of the traditional PC (using a desktop computer and a notebook computer) for the first time, and the position of the mobile phone serving as the first large Internet surfing terminal equipment is more consolidated. The development of information technology is very rapid, the quantity of various forms of information is also rapidly increasing, along with the diversification and complexity of the user search requirements, the user is not satisfied with simple text search any more, and the image is taken as an important information carrier, so that the daily life is full of rich and various image information. For example, the user can see the favorite beautiful head portrait, want to find similar head portrait, see a piece of clothes or skirt, want to find similar money, etc. which is inconvenient to express by words, and has the searching requirement of picture reference.
The development of information technology is very rapid today, the amount of information in various forms is also rapidly increasing, and as the requirements for user retrieval are diversified and complicated, users are no longer satisfied with simple text retrieval, but are more inclined to the information retrieval of images. The daily life is full of rich and diverse image information. For example, the user may see a favorite avatar, want to find a similar style, or see a garment or skirt, want to find a similar try, etc. The situation is inconvenient to express by words, but the retrieval requirement of a user can be met very quickly by using pictures. Smart phones are of course indispensable as collectors of images. From the new data it is known that 2015 global smart phone users will reach 19.1 billion and 2016 the index will increase by 12.6% to 21.6 billion. Smartphones will gradually occupy the information communication market.
It has become a crucial research hotspot in today's image retrieval field to investigate how a user should quickly find his own desired information from so many choices of images, and how to provide a quick and efficient way to retrieve images. In the existing research work, it is common practice to extract high-dimensional features from high-dimensional data of an image according to a specific method (such as a sift operator commonly used in the image), and then build an index according to the features to increase the query speed. However, under different data characteristics, the vector dimension is usually up to tens or hundreds of dimensions, and the data volume of each dimension is very large, which requires that the high-dimensional index structure has better dimension expansibility, that is, the index can still maintain better performance with the increase of the dimension. Unfortunately, most conventional spatial indexing techniques now suffer from dimensional disasters, such as rtrees and Voronoi indexes, and in general, the current high-dimensional feature indexing techniques suffer from the following disadvantages: (1) Most traditional index structures have poor expansibility and encounter dimension disaster problems; (2) Most traditional indexing mechanisms make certain assumptions (such as uniform distribution) about data distribution when dividing data space, and are usually different from real distribution (such as oblique distribution, zipf normal distribution, etc.) of data; (3) Most high-dimensional index structures have high spatial and temporal complexity and poor accuracy.
Disclosure of Invention
In order to solve the problem that the existing position-sensitive hash index cannot adapt to the distributed index, the invention provides a large-scale image query system based on the inverted position-sensitive hash index in a mobile environment, which can realize that the position-sensitive hash index adapts to the distributed index.
In order to achieve the above purpose, the present invention adopts the following technical scheme: a large-scale image query system based on inverted position sensitive hash index in mobile environment comprises a client, a cloud center service system and a client, wherein the client collects and extracts picture characteristics and communicates with the cloud center service system; and the cloud center service system establishes a distributed inverted index based on the position sensitive hash and inquires a neighbor image corresponding to the acquired picture.
The beneficial effects are that: because the cloud center service system establishes the inverted position-sensitive hash index, the position-sensitive hash index can adapt to distributed query, so that the problems of overlarge information quantity, inconsistent required information and display pictures and the like are solved, a user is helped to save the time of searching and querying as much as possible.
Drawings
FIG. 1 is a schematic diagram of a large-scale high-dimensional image retrieval system based on inverted position-sensitive hash indexes in a cloud computing environment;
FIG. 2 illustrates a kNN algorithm process based on distributed inverted grid indexes of the present invention;
FIG. 3 is a functional block diagram of the present invention;
FIG. 4 is a flow chart of image retrieval of the present invention;
FIG. 5 is a flow chart for implementing image lookup in accordance with the present invention.
Detailed Description
Example 1:a system for large-scale image querying based on inverted position-sensitive hash index in a mobile environment, comprising: the client acquires and extracts picture characteristics and communicates with the cloud center service system; and the cloud center service system utilizes the cloud strong distribution computing power to establish a reverse position sensitive hash index and inquire neighbor images corresponding to the acquired pictures.
Generally, the kNN algorithm is based on a distributed inverted index, but the position-sensitive hash-based index is not a distributed index, and in order to adapt to the distributed index, the position-sensitive hash-based distributed inverted index is established: in the technical scheme, when a position-sensitive Hash index is established, a plurality of Hash buckets are separated, the Hash buckets are used as keys, point sets in the Hash buckets are used as values, and MapReduce is used for distributed solving. According to the technical scheme, the index based on the position sensitive hash is analyzed into a Key-Value structure to adapt to the distributed index, so that the index can use a kNN algorithm to realize inquiry; and the distributed index can realize that adjacent points are collected to adjacent Hash buckets (similar data are hashed to the same area), so that the query speed can be increased.
In this technical solution, as a preferred option, the LSH regards the objects in the high-dimensional space as spatial data points with position information, and maps all the spatial object points to m hash tables T through a family of hash functions F () i In the above, m= |f|, that is, each hash function F e F corresponds to one hash table, and each hash table stores all object points in the space. Giving a query point q, and respectively calculating the result value of each q point in the hash function: { f 1(q) ,f 2(q) …f m(q) ,f i E F, i=1, 2 … m }, all F i(q) Fall into hash table T i And taking the points in the bucket as candidate sets, calculating the distance between the points and q, and finally sorting out k nearest points to obtain a kNN result set.
The technical scheme is used for solving the problem of high-dimensional data retrieval in a mass data environment, a high-dimensional data vector is actually abstract of a text, then the high-dimensional vector is subjected to dimension reduction by utilizing a hash projection technology by combining the characteristics of a position sensitive hash function, the high-dimensional vector is used as a hash index value, the corresponding high-dimensional vector is used as a data record, the same hash operation is performed on other data, and the obtained result is subjected to screening and query optimization to obtain a final candidate result set.
Traditional LSH algorithms can only be performed on a single machine, but are limited by the lack of performance, computing and storage resources of a single computer node. This limitation is more severe with the advent of dimension disasters, especially at high dimensions.
Under large-scale data, the original spatial index designed for a single machine and a typical query algorithm also need to be redesigned and optimized under a distributed environment. The research on how to introduce the distributed processing and the spatial index optimization technology into the cloud computing is combined, and the problem of effectively searching the large-scale space is a new research point which needs to be further developed.
In the distributed technology, the distributed file system can scientifically store all multimedia data in all different computer nodes according to different network environments and other factors, and the multimedia data are interconnected through a network, so that the whole is realized, and the problem of large-scale data storage is solved. In addition, as all data are stored in different computer nodes, the data security, feasibility, read-write efficiency and other aspects of the whole system are greatly improved. The distributed technology has a mature model in terms of calculation, and can improve the calculation speed and corresponding capability of the system, such as the Google distributed file system which is already proposed and applied.
Based on the above points, the performance of the LSH algorithm can be well improved by using the distributed technology, and the application of the distributed technology is also required by the reality of modern network multimedia development.
Example 2:the present embodiment has the same technical scheme as that of embodiment 1, and more specifically is: the embodiment discloses a specific method for establishing a position sensitive hash-based distributed inverted index, which is characterized in that a data set is stored in an HDFS distributed file system in advance, when a task is started, a plurality of configuration file LSH hash function families are read in through a distributed caching mechanism, each Map task reads in a data fragment designated by a JobTracker as input, then hash mapping dimension reduction is carried out on each data object according to a given hash function, a hash value is obtained after a high-dimensional vector is subjected to hash mapping, the hash value is used as an index value, for example, for the high-dimensional vector v, a hash value hi (v) is obtained after the i-th hash function hi () is mapped, and finally<i#hi(v),v.id>The key value pair is output, so that the binary group can be obtained after calculation for each high-dimensional data vector, and the same operation is performed for each piece of data by using each hash function. The output of the Map process serves as the input to Reduce where all data objects of the same hash are collected together to space the data objectsAnd separating, and finally outputting the separated result to an HDFS distributed file system for storage. In the process of constructing the distributed index, a Combine optimizing process can be added between the Map and the Reduce to Reduce transmission of intermediate data, and detailed pseudo codes for constructing the distributed index are as follows:
algorithm: LSH-based distributed index architecture
Input: high-dimensional dataset set S, hash function family H
And (3) outputting: high-dimensional index file
Example 3:this embodiment has the same technical scheme as embodiment 1 or 2, more specifically: the embodiment discloses a query method, wherein the query is a kNN query based on a position-sensitive hash distributed inverted index, and the steps are as follows: let the high-dimensional data set be S, which is a gallery of a large number of plants already in the image retrieval system, each image in the gallery storing 128-dimensional high-dimensional features. The query object set is Q, Q is a query image object, for example, a shot image of a group of flowers, the feature set is formed after high-dimensional feature extraction, each query object Q belongs to Q, an initialization association function h, h belongs to G, G is a hash family, LSH is a multi-round hash algorithm, and different hash functions can obtain different hash results. h corresponds to the set of similarity points of q, the radius set r= getCandidates (hashvalue), the hash value is a hash value, and the hash algorithm maps binary values of arbitrary length to smaller binary values of fixed length, which are called hash values. Different hash values can yield different hash results, where R is the bucket width.
A hash collision with Guan Haxi function is performed in a region with a radius r, each object of the hash is hash value=computer (q, h), and each hash value is obtained through a hash calculation function, namely, computer calculates. For example, taking the remainder is the simplest calculation method, collision is performed for multiple times until all data are hashed, and an object d=computer (q, c) with relatively high collision times is screened, so that the probability that the object d is a nearby point of the object q is highest. I.e. d is the object with the highest probability of collision. To reduce the amount of intermediate data of MapReduce, we have adopted a method based on hash collision count, replacing the actual value with collision count statistics. The number of times of collision is counted as a final sorting result, so that the intermediate data volume can be greatly reduced, and the processing speed of MapReduce is increased.
Preferably, the collision region is given an error calibration range (1+θ) r. When hash collision is carried out, the number of collisions is counted as a final sorting basis for the hash function F (h), and the more the number of collisions is, the more the sorting is, the earlier the collision is, and the collision region is given an error calibration range (1+theta) r because the r region has a certain error. And due to the existence of a system error, the collision needs to be carried out for a plurality of times to screen the final result: the collision occurs between q and adjacent data, and at the same time, a small part of non-adjacent data collides with the data; then performing second collision and third collision until all data are hashed, screening out points d=computer (q, c) with relatively high collision times, wherein the probability of the points being the adjacent points of q is highest, and then sorting out the data for integration.
Example 4:this embodiment has the same technical scheme as embodiment 1 or 2 or 3, more specifically: and the client extracts the high-dimensional characteristics of the picture, generates high-dimensional characteristic data of the picture, and transmits the high-dimensional characteristic data of the picture to the cloud center service system. Dimensions are typically extracted in more than 128 dimensions, experiments are performed at 128X128 dimensions, and large scale typically refers to greater than 10G-several hundred T. The experiment is a sample test performed under 16G data.
The scheme has the following effects:
(1) The application adopts a design mode of intelligent selection of the connecting line, so that the mobile phone can select a more proper cloud computing server to transmit data. The software is installed on the mobile phone and the server, the client is installed on the software of the smart mobile phone, once a user transmits a certain required similar picture to the mobile phone software, the software firstly extracts the data characteristics of the picture, and then the data is simply matched by utilizing partial data stored in the memory of the mobile phone. If the memory of the mobile phone contains the needed data, the software converts the data into corresponding pictures to be displayed on the software; if the corresponding data does not exist in the memory data of the mobile phone, the software is connected with the server by using the 2G, the 3G and the 4G, WIFI, the picture characteristic data is transmitted to the server, the LSRP-tree retrieval index is carried out in the server, and the data is fed back to the mobile phone software after the completion, and the result is displayed.
(2) Cloud computing is a resource sharing platform based on the Internet of things, and the platform can be provided for computers and other devices as required through shared software and hardware resources and information, so that a large-scale high-dimensional image retrieval system based on inverted position-sensitive hash indexes
The cloud computing characteristic is utilized to provide a powerful data processing system for the image characteristic query, and the support of the powerful data processing system can enable the mobile phone to present the matched images required by a user under the condition of limited hardware facilities.
(3) The data processed by the mobile phone after the user uploads the picture to the software is simply memory reproduction, and the mobile phone memory is limited after all, so the picture index can not basically meet the user requirement. The real picture index is to transmit the picture feature data to the cloud server by using the network. For the server, a massive data set is divided into a plurality of sub data sets through MapReduce in advance, tasks of the sub data sets are scheduled, then a distributed inversion space high-dimensional index based on LSH is established, and a final result is obtained through carrying out multi-round local sensitive hash dynamic collision detection algorithm calculation on similar data hashed to the same area.
The embodiment discloses a large-scale high-dimensional image retrieval system based on inverted position sensitive hash indexes in a cloud computing environment, and belongs to the application field of large-scale space-time data processing and mobile technology. In the system, a new index structure (LSRP-tree) is formed by combining LSH and RP-tree, so that the index cost is reduced in high-dimensional data query, and the query quality and the query efficiency are improved; the new algorithm (H-c 2 kNN) formed by combining LSH and MapReduce shows good expansibility and high efficiency. Both innovative applications practically solve the problem of approximate retrieval in high-dimensional data space. The collision counting method based on the hash collision is adopted, the collision counting statistics is used for replacing an actual value, the number of collisions is counted, and the number of collisions is used as a final sequencing result, so that the intermediate data volume can be greatly reduced, and the processing speed of MapReduce is increased. Cloud computing provides a very convenient communication platform for data communication under the condition of limited hardware. The invention relates to a system for searching pictures by utilizing an intelligent mobile platform, which comprises a group of cloud servers and a mobile client, in particular to software installed on the intelligent mobile platform (such as a smart phone or a tablet personal computer) for users to use respectively. The client comprises basic functions of gallery, photo shooting, transmission, picture scanning and the like, and the cloud server is responsible for controlling the whole picture searching process and processing related data (including building of LSH indexes, distributed kNN inquiry and the like) and performs image retrieval by extracting feature vectors. The invention effectively and advantageously solves the problems of overlarge information quantity, inconsistent information and pictures and the like, and helps users save time as much as possible, maximally solves the problem of eliminating massive information, and simplifies and rationalizes the utilization of resources. Meets the further demand of people for mobile information retrieval intellectualization.
Example 5:a large-scale high-dimensional image retrieval system based on inverted position sensitive hash index in a cloud computing environment comprises a cloud center service system and an intelligent mobile client. The cloud center service system is used for establishing a hash algorithm and executing inverted grid indexes, the intelligent mobile client is used for collecting pictures and sending the information to the cloud center service system through a wireless network, and the intelligent mobile client is also used for receiving the best pictures returned by the cloud center service system. The invention also improves the defects in the space data index and query method in the prior art, solves the problem of mass information screening to the maximum extent, simplifies and rationalizes the utilization of resources,meets the further demand of people for resource intellectualization. The basis of the application is an LSRP-tree index structure and an H-c2kNN query algorithm which are respectively generated by combining Location Sensitive Hash (LSH) with MapReduce and RP-tree.
Example 6:the technical scheme is the same as that of embodiment 5, when a user takes a picture with a mobile phone or acquires the picture through a wireless network, a corresponding search engine is applied to upload the picture, a cloud center service system automatically extracts the characteristics of color, shape, texture and the like of an image by using an image analysis program, the extracted picture feature vectors are subjected to data analysis and matching by using an established inverted grid index based on a hash algorithm, k nearest neighbors are returned according to the matching precision requirement, and then corresponding k images are found according to the k vectors, and information is fed back to a user side in time.
Example 7:the processing method of the position sensitive hash has the same technical scheme as that of the embodiment 6, and the processing method comprises the following steps: firstly, establishing a distributed inverted space high-dimensional index based on LSH through massive data sets, taking a hashed bucket as a Key, taking a point set in the bucket as a Value, and then carrying out distributed solving by using MapReduce. And then, combining a plurality of rounds of local sensitive hash and a dynamic collision frequency detection algorithm, and utilizing MapReduce to screen and query and optimize results. And finally, a result set is obtained, and meanwhile, adjacent points can be collected in adjacent barrels by using the inverted index, and the approximate searching speed can be increased by detecting the adjacent barrels.
In order to reduce the intermediate data volume of MapReduce, a method based on Hash collision counting is adopted, the actual value is replaced by collision counting statistics, the number of collisions is counted, and the final sorting result is used, so that the intermediate data volume can be greatly reduced, and the processing speed of MapReduce is increased.
The embodiment adopts a large-scale distributed hash algorithm and other space data processing algorithms, integrates a large-data processing mode into the query stage of the large-scale high-dimensional image retrieval system by analyzing the characteristics of the color, the shape, the texture and the like of the images, searches the optimal images in mass data, feeds back the image information to a user side, and finally completes the image retrieval problem. The effect is as follows: by analyzing the characteristics of the color, shape, texture and the like of the picture, the best approximate matching picture is provided for the user.
The large-scale high-dimensional image retrieval described in this embodiment has the following structure and benefits:
(1) The application adopts a design mode of intelligent selection of a connecting line, so that a mobile phone selects a more appropriate cloud computing server to transmit data; the software is installed on the mobile phone and the server, the client is installed on the software of the smart mobile phone, once a user transmits a certain required similar picture to the mobile phone software, the software firstly extracts the data characteristics of the picture, and then the data is simply matched by utilizing partial data stored in the memory of the mobile phone. If the memory of the mobile phone contains the needed data, the software converts the data into corresponding pictures to be displayed on the software; if the corresponding data does not exist in the memory data of the mobile phone, the software is connected with the server by using the 2G, the 3G and the 4G, WIFI, the picture characteristic data is transmitted to the server, the LSRP-tree retrieval index is carried out in the server, and the data is fed back to the mobile phone software after the completion, and the result is displayed.
(2) Cloud computing is a resource sharing platform based on the Internet of things, and the platform can be provided for computers and other devices as required through shared software and hardware resources and information. Therefore, the large-scale high-dimensional image retrieval system based on the inverted position-sensitive hash index utilizes the characteristic of cloud computing to provide a powerful data processing system for inquiring the picture characteristics, and the support of the powerful data processing system can enable the mobile phone to present the matched picture required by a user under the condition of limited hardware facilities.
(3) The data processed by the mobile phone after the user uploads the picture to the software is simply memory reproduction, and the mobile phone memory is limited after all, so the picture index can not basically meet the user requirement. The real picture index is to transmit the picture feature data to a cloud server by using a network, and for the server, a massive data set is divided into a plurality of sub data sets by MapReduce in advance, tasks of the sub data sets are scheduled, then a distributed inverted space high-dimensional index based on LSH is established, and a final result is obtained by carrying out multi-round local sensitive hash dynamic collision detection algorithm calculation by hashing similar data to the same area.
Example 8:the present embodiment gives a definition of kNN queries.
The formalized definition of a kNN query is given below: given a set of spatial data points P, query point q and an integer k (k > 0), k-nearest neighbor queries find a set kNN of k data points, and satisfy dist (P ', q). Ltoreq.dist (P, q) for any P' ∈kNN and any P ε P-kNN.
For picture indexes in a high-dimensional environment, the H-c2kNN algorithm formed by combining the LSH index and the MapReduce is not lost as a preferred algorithm. One important source of spatially large-scale data, high-dimensional data, is the high-speed growth of data. How to make kNN queries on large-scale spatial data under high-dimensional conditions is an important direction of interest in recent years.
The following steps are further explanation of hashvalue=computer (q, h) in embodiment 2, r= getCandidates (hashvalue), and algorithm execution is the calculation performed by (1) initialization (2) in step 1 in embodiment 2.
Setting a group of hash functions F (), calculating and mapping all objects in space to m hash tables T i In the above, m= |f|, that is, each hash function F belongs to one hash table corresponding to F, and each hash table stores all object points in the space. Given a query point q, function result values of each hash function to q point mapping are calculated respectively.
The following steps correspond to the calculation of the collision algorithm of d=computer (q, c) in embodiment 3, which corresponds to step 2 in embodiment 2, and is an optimization process. Example 2 is actually an algorithm execution process, and example 3 is a theoretical analysis of the algorithm, i.e., a mathematical operation process.
In performing a hash collision, for the hash function F (H), any M, M 'is found, and satisfying H (M) =h (M') presents a computational difficulty, then H (M) is referred to as a strong one-way hash function or a collision-free function, also commonly referred to as a hash function. Therefore, we replace the actual value with the hash collision number that is easier to perform. The number of times of collision is counted as a final sorting result, so that the amount of intermediate data can be greatly reduced, the MapReduce processing speed is increased, and the processing result is fed back to a user more quickly.
For server operation, data differentiation and data mirroring are utilized, and indiscriminate data service is provided by a cloud computing providing technology.
Example 9:referring to fig. 1, a schematic diagram of a large-scale high-dimensional image retrieval system based on an inverted position-sensitive hash index in a cloud computing environment is described as follows: establishing a distributed inverted space high-dimensional index based on LSH to perform knN query, setting a data set as S and a query object as Q, initializing a similar point set of a correlation function h, h belongs to G and the corresponding Q for each Q, and performing Guan Haxi function collision in a region with a radius of R, wherein each point of the hash is hashvalue=computer (Q, h), and an error calibration range (1+theta) R is given to the collision region because of a certain error in the R region. And due to the existence of a system error, the collision needs to be carried out for a plurality of times to screen the final result: the collision occurs between q and adjacent data, and at the same time, a small part of non-adjacent data collides with the data; then performing second collision and third collision until all data are hashed, screening out points d=computer (q, c) with relatively high collision times, wherein the probability of the points being the adjacent points of q is highest, and then sorting out the data for integration.
Example 10:referring to fig. 2, in the kNN query index of the large-scale high-dimensional image, since kNN algorithm is based on a distributed inverted grid index, we set the grid cell size as "g", and set a query point q, and use Aq to represent the grid where q points are located, use r as radius, use q as center of circle to make a circle, and P (xi, yi) represents the nearest distance of the q points in. For a specific hash function, given a point q, when finding its adjacent point, firstly, dividing the function into a plurality of 'bucket' keys (referring to a specific relation existence area, denoted by hv1, hv2, hv3, hvN) by using the function, then, correspondingly hashing the function into the specific bucket by using the relation between the data and the hash function, and simultaneously, matching q with the bucketAnd (3) carrying out collision on the data in the barrels of the corresponding relation, namely, keyi= hvi =g (i), and screening out the data with higher q collision times for integration, so as to obtain the similar data of a certain specific point.
For the index of a picture, more features are needed to determine a specific picture, so that similar data of other features are required to be obtained, for example, given another point P, similar data are also built in the same data table, R is taken as a radius, P is taken as a center of a circle, a neighbor data point is found, the nearest neighbor of P and q points possibly has the same point M, the probability that M is the feature data point of the index is particularly high, and the like, and hashing is performed until final data are found.
Example 11:referring to fig. 3, in consideration of the characteristics of portability of a mobile terminal, the limitation of software and hardware resources and the advantages of cloud computing, a large-scale high-dimensional image retrieval system based on an inverted position sensitive hash index uses a thin client mode of a C/S architecture, a cloud server is responsible for main data processing work, a client only needs to simply send a required chart, receives and displays a result, a handheld device client accesses a mobile internet to establish contact with a cloud server through a wireless network based on a 2 g/3 g/4 g mode or a WIFI, the client is responsible for displaying a picture and carrying relevant parameters, such as picture feature data information sends a request to the cloud server, a mobile phone logs in and then sends picture features to the cloud server, the cloud server adopts a high-dimensional spatial mass data distributed space inverted index technology designed by us, namely an LSH technology, and the cloud server adopts a parallelization kNN query technology to quickly find pictures required by a user from the massive chart. And inquiring the client software interface to which the picture is sent, thereby completing the requirement of the user.
Example 12:referring to fig. 4, the present invention is a system for searching pictures by using an intelligent mobile platform, which includes a cloud server and a mobile client, specifically, software installed on the intelligent mobile platform (such as a smart phone or a tablet computer) for users to use respectively. The client comprises a gallery, photo shooting, transmitting and picture scanningAnd the cloud server is responsible for controlling the whole picture searching process and processing related data (including establishing an LSH index, inquiring distributed kNN and the like).
Example 13:referring to fig. 5, the steps involved in implementing the graph lookup by the present invention are as follows: the user obtains the required similar pictures through shooting or other ways, scans through the client, the mobile phone client finds out the picture characteristics and uploads the picture characteristics to the cloud server, the cloud server performs data processing through the uploaded picture characteristic data, finds out the similar characteristic picture data, then returns the data to the client, and the client indexes the pictures to be displayed on a software interface so as to complete tasks. After the pictures are displayed on the client interface, picture numbers, source links and the like are displayed, so that a user can know chart information more deeply.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should be covered by the protection scope of the present invention by making equivalents and modifications to the technical solution and the inventive concept thereof.

Claims (1)

1. A mobile perception data query method based on a position sensitive hash index is established, which is characterized in that: regarding objects in a high-dimensional space as spatial data points with position information, mapping all object points in space to m hash tables T through a family of hash functions F () i In the method, m= |F| is given to a query point q, and the result value of each q point in the hash function is calculated respectively: { f 1(q) ,f 2(q) …f m(q) ,f i E F, i=1, 2 … m }, all F i(q) Fall into hash table T i The points in the bucket are used as candidate sets for calculating the distance between the points and q, and k points closest to the points are finally sorted out, so that a kNN result set is obtained;
the specific method for establishing the reverse position sensitive hash index is that a data set is stored in an HDFS distributed file system in advance, when a task is started, a plurality of configuration file LSH hash function families are read in through a distributed caching mechanism, each Map task reads in data fragments designated by JobTracker as input, then hash mapping dimension reduction is carried out on each data object according to a given hash function, a hash value is obtained after a high-dimensional vector is subjected to hash mapping, the hash value is used as an index value, output is carried out in a key value pair mode, output of a Map process is used as input of a Reduce, all data objects of the same hash are collected together in the Reduce, the data objects are separated, and the data objects are output to the HDFS distributed file system as a result for storage;
the query is a kNN query based on a position sensitive hash distributed inverted index, and the steps are as follows: setting the size of a grid unit as a grapple, giving a query point q, using Aq to represent the grid where the q point is located, using r as a radius, q as a circle center to make a circle, and for a specific hash function, giving a point q, dividing the function into a plurality of barrel keys when finding adjacent points, wherein the barrel keys refer to a specific relation existence area and are expressed by hv1, hv2, hv3, and hvN; then the function utilizes the relation between the data and the hash function to hash into a specific barrel correspondingly, and meanwhile, q collides with the data in the barrel with the corresponding relation, and the data with q, which have higher collision times, are screened out and integrated, so that the similar data of a specific point can be obtained; for the index of a picture, more features are needed to determine a specific picture, similar data of other features are required to be obtained, another point P is given, the similar data is also built in the same data table, R is taken as a radius, P is taken as a circle center, adjacent data points are found, the same point M possibly appears in the nearest neighbor of P and q points, the probability that M is the index feature data point is particularly high, and the like is hashed until final data are found; the method for screening the data with higher q collision times comprises the following steps: setting a high-dimensional data set as S, wherein S is an existing large-scale image library in an image retrieval system, a query object set is Q, Q is a query image object, a feature set is formed after high-dimensional feature extraction is performed, for each query object Q belongs to Q, an association function h is initialized, h belongs to G, G is a hash family, LSH is a multi-round hash algorithm, different hash functions can obtain different hash results, h corresponds to a Q similarity point set, a radius set R= getCandidates (hashvalue), hash value is a hash value, different hash values are obtained, and R is a bucket width; and (3) carrying out hash collision with Guan Haxi function in a certain radius r region, wherein each object in the hash is hash value=computer (q, h), each hash value is obtained through a hash calculation function, the data are all hashed for multiple times, the object d=computer (q, c) with relatively high collision times is screened out, the probability that the object d is the nearest point of the object q is highest, the object d is the object with highest collision probability, the actual value is replaced by collision count statistics, and the number of times of collision is taken as the final sequencing result.
CN201910324550.1A 2016-02-05 2016-02-05 Mobile perception data query method based on position sensitive hash index Active CN110175258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910324550.1A CN110175258B (en) 2016-02-05 2016-02-05 Mobile perception data query method based on position sensitive hash index

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910324550.1A CN110175258B (en) 2016-02-05 2016-02-05 Mobile perception data query method based on position sensitive hash index
CN201610083052.9A CN105760468B (en) 2016-02-05 2016-02-05 The extensive image query system of sensitive hash index is set under mobile environment based on ranking

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201610083052.9A Division CN105760468B (en) 2016-02-05 2016-02-05 The extensive image query system of sensitive hash index is set under mobile environment based on ranking

Publications (2)

Publication Number Publication Date
CN110175258A CN110175258A (en) 2019-08-27
CN110175258B true CN110175258B (en) 2024-01-23

Family

ID=56330084

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910324550.1A Active CN110175258B (en) 2016-02-05 2016-02-05 Mobile perception data query method based on position sensitive hash index
CN201610083052.9A Active CN105760468B (en) 2016-02-05 2016-02-05 The extensive image query system of sensitive hash index is set under mobile environment based on ranking

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201610083052.9A Active CN105760468B (en) 2016-02-05 2016-02-05 The extensive image query system of sensitive hash index is set under mobile environment based on ranking

Country Status (1)

Country Link
CN (2) CN110175258B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777130B (en) * 2016-12-16 2020-05-12 西安电子科技大学 A kind of index generation method, data retrieval method and device
CN108664478B (en) * 2017-03-27 2021-07-20 华为技术有限公司 Target object retrieval method and device
CN107818147A (en) * 2017-10-19 2018-03-20 大连大学 Distributed temporal index system based on Voronoi diagram
CN107658010A (en) * 2017-10-19 2018-02-02 大连大学 Portable medical querying method and application based on the row's of falling Thiessen polygon index
CN108153910B (en) * 2018-01-22 2021-11-16 大连大学 Establishing distributed space-time multidimensional indexing system for mobile medical service
CN109033314B (en) * 2018-07-18 2020-10-23 哈尔滨工业大学 Real-time query method and system for large-scale knowledge graph under condition of limited memory
CN111666358A (en) * 2019-03-05 2020-09-15 上海光启智城网络科技有限公司 Track collision method and system
CN110958109B (en) * 2019-10-12 2023-09-19 上海电力大学 Light dynamic data integrity auditing method based on hierarchical merck hash tree
CN113096284B (en) * 2021-03-19 2022-08-30 福建新大陆通信科技股份有限公司 CTID access control authorization information verification method
CN113588195B (en) * 2021-08-10 2022-07-26 同济大学 Collision jam detection method and device
CN113918836B (en) * 2021-09-14 2024-12-17 国网新疆电力有限公司信息通信公司 Space query verification method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609441A (en) * 2011-12-27 2012-07-25 中国科学院计算技术研究所 Local-sensitive hash high-dimensional indexing method based on distribution entropy
CN104035949A (en) * 2013-12-10 2014-09-10 南京信息工程大学 Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm
CN104199827A (en) * 2014-07-24 2014-12-10 北京大学 Locality-sensitive-hashing-based high-dimensional indexing method for large-scale multimedia data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101478608A (en) * 2009-01-09 2009-07-08 南京联创科技股份有限公司 Fast operating method for mass data based on two-dimensional hash
CN103324650A (en) * 2012-10-23 2013-09-25 深圳市宜搜科技发展有限公司 Image retrieval method and system
CN103336801B (en) * 2013-06-20 2016-08-10 河海大学 Remote sensing image retrieval method based on multiple features LSH index combination
CN104699701A (en) * 2013-12-05 2015-06-10 深圳先进技术研究院 Parallel nearest node computing method and distributed system based on sensitive hashing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609441A (en) * 2011-12-27 2012-07-25 中国科学院计算技术研究所 Local-sensitive hash high-dimensional indexing method based on distribution entropy
CN104035949A (en) * 2013-12-10 2014-09-10 南京信息工程大学 Similarity data retrieval method based on locality sensitive hashing (LASH) improved algorithm
CN104199827A (en) * 2014-07-24 2014-12-10 北京大学 Locality-sensitive-hashing-based high-dimensional indexing method for large-scale multimedia data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于位置敏感哈希的近似kNN查询算法研究;邱文明;《中国优秀硕士学位论文数据库 信息科技辑》;20150715(第7期);第33-43页第4.2-4.5节、结论 *

Also Published As

Publication number Publication date
CN105760468A (en) 2016-07-13
CN105760468B (en) 2019-05-31
CN110175258A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN110175258B (en) Mobile perception data query method based on position sensitive hash index
CN110046268B (en) High-dimensional space kNN query method based on inverted position sensitive hash index
US10102227B2 (en) Image-based faceted system and method
RU2533441C2 (en) Method and apparatus for facilitating content-based image search
TWI745818B (en) Method and electronic equipment for visual positioning and computer readable storage medium thereof
US20190188222A1 (en) Thumbnail-Based Image Sharing Method and Terminal
WO2022198853A1 (en) Task scheduling method and apparatus, electronic device, storage medium, and program product
TWI761851B (en) Image processing method, image processing apparatus, electronic device, and computer-readable storage medium
CN102831405B (en) Method and system for outdoor large-scale object identification on basis of distributed and brute-force matching
EP1324223A2 (en) Apparatus and method for searching multimedia objects
CN105824928A (en) Mobile terminal, server, content-based image recognition search method and system
TWI785267B (en) Method and electronic apparatus for image processing and storage medium thereof
CN110083762B (en) Room source searching method, device and equipment and computer readable storage medium
CN110347858B (en) Picture generation method and related device
CN102880854A (en) Distributed processing and Hash mapping-based outdoor massive object identification method and system
WO2022100290A1 (en) Image search method and computing device
CN103999097B (en) System and method for compact descriptor for visual search
CN112115281A (en) Data retrieval method, device and storage medium
CN110263830B (en) Image processing method, device and system and storage medium
CN106021423B (en) META Search Engine personalization results recommended method based on group division
CN114691940A (en) Index construction method and device, vector search method and retrieval system
KR101896177B1 (en) An Image Searching System Providing Multiplex Result
CN111667347A (en) Big data type searching system based on 5G communication technology and searching method thereof
CN112307829B (en) Digital retina mass target retrieval space-time matrix presentation method
Huang et al. PickPic: A real-time and low-redundancy image selection framework for disaster perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant