CN118503264A - Data query method, apparatus, computer device, readable storage medium, and program product - Google Patents
Data query method, apparatus, computer device, readable storage medium, and program product Download PDFInfo
- Publication number
- CN118503264A CN118503264A CN202410638753.9A CN202410638753A CN118503264A CN 118503264 A CN118503264 A CN 118503264A CN 202410638753 A CN202410638753 A CN 202410638753A CN 118503264 A CN118503264 A CN 118503264A
- Authority
- CN
- China
- Prior art keywords
- hot spot
- index
- cluster
- list
- spot list
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000012544 monitoring process Methods 0.000 claims abstract description 18
- 238000004590 computer program Methods 0.000 claims description 24
- 230000008859 change Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000012423 maintenance Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a data query method, a device, equipment, a medium and a program product, and relates to the technical field of big data. The method comprises the following steps: periodically counting the access frequency of each index in a preset time period, sending the access frequency to a first cluster, indicating the first cluster to predict a hot spot index according to the access frequency fed back by each proxy node, and registering the hot spot index and a corresponding data value to a second cluster to form a first hot spot list; monitoring a first hot spot list in a second cluster, and caching the first hot spot list into a local storage space to form a second hot spot list; under the condition that a data query request sent by a client is obtained, querying in a second hot spot list based on a target index carried in the data query request; and if the target index is in the second hot spot list, feeding back the target data value corresponding to the target index stored in the local storage space to the client. By adopting the method, the access pressure of the database can be reduced.
Description
Technical Field
The present application relates to the field of big data technology, and in particular, to a data query method, apparatus, computer device, computer readable storage medium, and computer program product.
Background
With the rapid development of the distributed system, the requirements of users and services on the distributed system are higher and higher, and the users need faster access speed to obtain good experience, so an emerging database mainly adopts a key-value mode to store data. When the system faces business scenes such as commodity second killing, marketing popularization, hot news and the like, namely the system accesses one key in a unit time in an extremely high concurrency way, the single-point hot key problem is extremely easy to form, at the moment, the performance of a single database can be greatly impacted, the database is even broken through the upper limit of a physical network card to cause faults and even downtime, and finally, the rear-end relational database can be formed into the extremely high-flow impact, so that the service link quality of the whole system is influenced, and even service is unavailable.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data query method, apparatus, computer device, computer-readable storage medium, and computer program product that can alleviate access pressure to a database.
In a first aspect, the present application provides a data query method, where the method is applied to any proxy node in a proxy cluster, and the method includes:
Periodically counting the access frequency of each index in a preset time period, sending the access frequency to a first cluster, indicating the first cluster to predict a hot spot index according to the access frequency fed back by each proxy node, and registering the hot spot index and a corresponding data value to a second cluster to form a first hot spot list;
monitoring the first hot spot list in the second cluster, and caching the first hot spot list into a local storage space to form a second hot spot list;
Under the condition that a data query request sent by a client is obtained, querying in the second hot spot list based on a target index carried in the data query request;
And if the target index is in the second hot spot list, feeding back a target data value corresponding to the target index stored in the local storage space to the client.
In one embodiment, the method further comprises:
And if the target index is not in the second hot spot list, reading a target data value corresponding to the target index from a database, and feeding back the target data value to the client.
In one embodiment, the method further comprises:
and if the access amount of the target hotspot index reaches the upper bandwidth limit, limiting the access of the target hotspot index.
In one embodiment, the method further comprises:
If the updating operation for the first hot spot list is generated, the second hot spot list is updated according to the updated first hot spot list, so that the updated second hot spot list is consistent with the updated first hot spot list.
In one embodiment, the update operations include a hotspot index change operation, an expiration operation, and a delete operation.
In one embodiment, the method further comprises:
and pushing the second hot spot list to other proxy nodes in the proxy cluster so as to enable the second hot spot list locally stored by all proxy nodes in the proxy cluster to be consistent.
In a second aspect, the present application further provides a data query device, including:
The hot spot discovery module is used for periodically counting the access frequency of each index in a preset time period, sending the access frequency to the first cluster, indicating the first cluster to predict the hot spot index according to the access frequency fed back by each proxy node, and registering the hot spot index and the corresponding data value into the second cluster to form a first hot spot list;
the hot spot monitoring module is used for monitoring the first hot spot list in the second cluster, and caching the first hot spot list into a local storage space to form a second hot spot list;
The local query module is used for querying in the second hot spot list based on a target index carried in the data query request under the condition that the data query request sent by the client is obtained;
And the hotspot feedback module is used for feeding back the target data value corresponding to the target index stored in the local storage space to the client if the target index is in the second hotspot list.
In a third aspect, the present application also provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the data query method described above when executing the computer program.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the data querying method described above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the data querying method described above.
According to the data query method, the device, the computer equipment, the computer readable storage medium and the computer program product, the access frequency of each index in a preset time period is counted periodically through any agent node in the agent cluster, the counted access frequency data is sent to the first cluster by each agent node, the access frequency of the index is counted through the agent node, the pressure of the server is reduced, the memory leakage problem of the client is avoided, the hot spot index is predicted from the global view through the first cluster based on the access frequency fed back by each agent node, the hot spot index is more accurate than the monitoring command carried by a single client or a database, in the process of predicting the hot spot index, the difficulty of summarizing the scale of the hot spot index is reduced, and the efficiency of data collection and analysis is improved; further, the predicted hotspot index and the data value corresponding to the hotspot index are registered to a second cluster to form a first hotspot list, so that centralized management and quick access of the hotspot index are ensured; and finally, the proxy node monitors the first hot spot list in the second cluster and caches the first hot spot list to a local storage space to form a second hot spot list, and when the client sends a query request, the client can directly query from the second hot spot list, so that the query request of the client is responded quickly, and the access pressure of the database is relieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are needed in the description of the embodiments of the present application or the related technologies will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is an application environment diagram of a data polling method in one embodiment;
FIG. 2 is a flow diagram of a method of data polling in one embodiment;
FIG. 3 is an overall architecture diagram of a data interrogation system in one embodiment;
FIG. 4 is a flow chart of a method of data polling in another embodiment;
FIG. 5 is a block diagram of the structure of a data polling device in one embodiment;
Fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
With rapid development of the distributed system, the requirements of users and businesses on the distributed system are higher and higher, and the users need faster access speed to obtain good experience, so that an emerging database mainly adopts a key value to store data for storing business scenes such as user session, microblog hot spot information, second killing promotion and the like, and the purposes of accelerating system access and relieving database pressure are achieved.
When the system faces business scenes such as commodity second killing, marketing popularization, hot news and the like, namely the system accesses one key in a unit time in an extremely high concurrency way, the single-point hot key problem is extremely easy to form, at the moment, the performance of a single database can be greatly impacted, the database is even broken through the upper limit of a physical network card to cause faults and even downtime, and finally, the rear-end relational database can be formed into the extremely high-flow impact, so that the service link quality of the whole system is influenced, and even service is unavailable.
According to the embodiment of the application, the access frequency of the index is counted through the proxy nodes, the access frequency fed back by each proxy node is based on the first cluster, the hot spot index is predicted from the global view, the predicted hot spot index and the data value corresponding to the predicted hot spot index are registered to the second cluster to form the first hot spot list, the proxy nodes monitor the first hot spot list in the second cluster and cache the first hot spot list in the local storage space to form the second hot spot list, and when the client sends a query request, the client can directly query from the second hot spot list, so that the query request of the client is responded quickly, and the access pressure of the database is relieved.
The data query method provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 102 communicates with any one of the proxy nodes 106 in the proxy cluster 104 via a network. The database 108 may store data that the proxy node 106 needs to process. The proxy node 106 periodically counts the access frequency of each index in a preset time period, sends the access frequency to the first cluster, indicates the first cluster to predict the hot spot index according to the access frequency fed back by each proxy node, and registers the hot spot index and the corresponding data value to the second cluster to form a first hot spot list; the proxy node 106 monitors a first hot spot list in the second cluster, and caches the first hot spot list in the local storage space to form a second hot spot list; the proxy node 106 queries in the second hotspot list based on the target index carried in the data query request under the condition that the data query request sent by the client is acquired; if the target index is in the second hotspot list, the proxy node 106 feeds back the target data value corresponding to the target index stored in the local storage space to the client. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be intelligent vehicle devices, projection devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The head-mounted device may be a Virtual Reality (VR) device, an augmented Reality (Augmented Reality, AR) device, smart glasses, or the like. The proxy cluster 104 is a system composed of a plurality of proxy nodes 106 independent of each other, and the plurality of proxy nodes 106 are connected to each other through a network and are managed in a unified manner in a single system mode. The primary purpose of proxy cluster 104 is to improve system availability and scalability, ensuring that data query requests can still be handled efficiently under high load conditions. Proxy node 106 is a service node in proxy cluster 104 that is responsible for processing data query requests from clients. Each proxy node 106 has the ability to independently process data and is capable of communicating with other proxy nodes, sharing data and resources.
In an exemplary embodiment, as shown in fig. 2, a data query method is provided, and an example of application of the method to the proxy node in fig. 1 is described, which includes the following steps. Wherein:
Step 202, periodically counting the access frequency of each index in a preset time period, sending the access frequency to the first cluster, indicating the first cluster to predict the hot spot index according to the access frequency fed back by each proxy node, and registering the hot spot index and the corresponding data value to the second cluster to form a first hot spot list.
The preset time period refers to a specific time period preset when the proxy node counts the index access frequency. For example, the frequency of access to each index over the past hour, day, or week is counted.
An index is a unique identifier or key that is a data value. In a database or search engine, it is used to quickly retrieve and locate specific data records. A data value is the actual data or information associated with an index. The data value is the target of the client query request. When a client sends a data query request, the request typically specifies one or more indices from which the proxy node looks up the corresponding data value in the local or remote cluster.
The access frequency of an index refers to the number or frequency at which a certain index is accessed within a given period of time. The access frequency is a key indicator of the heat of the evaluation index.
The first cluster is a system or cluster which is responsible for processing access frequency data sent by the proxy nodes and predicting hot spot indexes according to the access frequency fed back by each proxy node. For example, the first cluster may be responsible for collecting hot spot indexes (keys) collected by each proxy node, performing summary, statistics, analysis and calculation in real time, predicting which key values have high access frequency, and writing into the second cluster.
Hot spot index, refers to an index that is frequently accessed.
The second cluster is used as a coordinator of a distributed system and is responsible for monitoring the operations of adding, invalidating, limiting the flow and the like of the hot spot index key in real time, and notifying all proxy nodes of the proxy cluster to update, invalidate, limit the flow and the like of the local cache in real time, so that the consistency of the local cache and the cache data of the proxy nodes is realized. For example, the second cluster may be a distributed, highly available, key-value pair-stored database, primarily for shared configuration and service discovery. The first cluster and the second cluster are two independent compute or storage clusters. They may be located in different physical locations or may be comprised of different hardware and software.
Specifically, the proxy node presets a statistics period, and in the statistics period, the proxy node records index access conditions of all clients to the data query request. After the counting period is over, the proxy node calculates the access times or frequency of each index in the period. The proxy node constructs the index access frequency obtained through statistics into a message, generally including fields of index identification, access frequency and the like, and sends the constructed message to one or more nodes in the first cluster. The nodes of the first cluster receive the index access frequency information from the proxy nodes and aggregate the received data to obtain the global index access frequency. The first cluster predicts the hot spot index by applying some algorithm (such as simple threshold judgment, time sequence analysis, machine learning, etc.) according to the aggregated index access frequency, and registers the hot spot index and the data value to the second cluster. The nodes of the second cluster receive the hotspot index and the data value from the first cluster and form a first hotspot list. The second cluster stores the first hot spot list in its internal storage system or cache and periodically updates or refreshes the hot spot list to ensure that it contains the latest hot spot index and data values.
Step 204, monitor the first hot spot list in the second cluster, and cache the first hot spot list in the local storage space to form a second hot spot list.
In the existing non-relational database product, access is mainly performed by a mode that a Client is connected with a Server, and detection of hot keys mainly comprises: the client side actively discovers the hot spot, and discovers the hot spot by adopting modes of a database with a monitoring command, network packet capturing and the like. The self-contained monitoring command has performance and safety influence under a high concurrency scene; the packet capturing mode has high development cost, can generate interference and even packet loss on the network under high concurrency, does not have active defense capability after hot key discovery, and still has great impact and influence on system stability.
Therefore, in the large-scale distributed system, particularly in the case of processing a large number of client requests and server responses, the hot spot discovery mode has the problems of client memory leakage, high maintenance cost and large-scale summarization difficulty. According to the embodiment of the application, the access frequency of the index is counted through the proxy nodes, the hot spot index is predicted from the global view through the first cluster based on the access frequency fed back by each proxy node, the method is more accurate than a single client or a database with a monitoring command, in the process of predicting the hot spot index, the difficulty of summarizing the hot spot index scale is reduced, and the efficiency of data collection and analysis is improved.
Monitoring means that the proxy node checks or receives the update from the second cluster in real time to acquire the latest first hot spot list.
Local storage space refers to memory or disk space on a proxy node that is used to store data. The proxy node caches the received first hot spot list in its local storage space for quick access when needed.
The second hot spot list is a copy of the first hot spot list cached by the proxy node in the local storage space. Since it is stored locally at the proxy node, it may be referred to as a "second" hotspot list, to distinguish it from the original "first" hotspot list stored in the second cluster.
Specifically, the proxy node is preconfigured with a listening mechanism for receiving an update notification from the second cluster or periodically querying the second cluster to obtain the latest first hotspot list. When the second cluster updates the first hotspot list, the proxy node receives a list update message through a listening mechanism, where the received list update message is typically a list or data structure containing the hotspot index and its associated information. The proxy node stores the received first hot spot list data in a local storage space thereof, and the first hot spot list copy stored in the local storage space is regarded as a second hot spot list. The proxy node needs to update its locally stored second hotspot list periodically to ensure that it remains synchronized with the first hotspot list in the second cluster.
Step 206, under the condition that the data query request sent by the client is obtained, querying is performed in the second hotspot list based on the target index carried in the data query request.
The client is a party which is used as a initiative initiating request and receiving service response in network communication. In general, a client refers to a software program running on top of some type of operating system that obtains desired information or services by communicating with a server.
A data query request is a request sent by a client to a proxy node for retrieving specific data from a storage system. The data query request typically contains information of a target index for instructing the proxy node to query for a target data value corresponding to the target index.
The target index is a key parameter carried in the data query request and is used for designating the index to be queried. The target index is a key identification that the proxy node uses when querying the second hotspot list.
Specifically, after receiving a data query request from a client, the proxy node analyzes the received data query request and extracts a target index therein. The proxy node queries in its locally stored second hotspot list based on the extracted target index.
Step 208, if the target index is in the second hotspot list, feeding back the target data value corresponding to the target index stored in the local storage space to the client.
Specifically, if a data item matching the target index is found in the second hotspot list, the proxy node will return the target data value corresponding to the target index to the client. If no match is found, the proxy node may need to take other actions, such as querying the original data source, returning a null result, or sending an error message to the client, etc.
In the data query method, the access frequency of each index in the preset time period is counted periodically through any agent node in the agent cluster, the counted access frequency data is sent to the first cluster by each agent node, the access frequency of the index is counted through the agent node, the pressure of a server is reduced, the memory leakage problem of a client is avoided, the hot index is predicted from the global view through the first cluster based on the access frequency fed back by each agent node, the method is more accurate than a single client or a database with a monitoring command, in the process of predicting the hot index, the difficulty of summarizing the hot index scale is reduced, and the efficiency of data collection and analysis is improved; further, the predicted hotspot index and the data value corresponding to the hotspot index are registered to a second cluster to form a first hotspot list, so that centralized management and quick access of the hotspot index are ensured; and finally, the proxy node monitors the first hot spot list in the second cluster and caches the first hot spot list to a local storage space to form a second hot spot list, and when the client sends a query request, the client can directly query from the second hot spot list, so that the query request of the client is responded quickly, and the access pressure of the database is relieved.
In one embodiment, the data query method further includes:
If the target index is not in the second hot spot list, reading a target data value corresponding to the target index from the database, and feeding back the target data value to the client.
The database is an organized and structured data set, and allows users to add, delete, modify, search and the like data through a certain query language (such as SQL). The database is of various types, for example, the database may be a cache cluster, which is a non-relational database storing data in key pairs. In short, the Key is used for indexing to realize the functions of storing, modifying, inquiring and deleting data. And the storage nodes in the cache cluster are used for receiving the client requests forwarded by the proxy nodes and storing a cache index (key) and a data value (value).
Specifically, after receiving a data query request from a client, the proxy node analyzes the received data query request and extracts a target index therein. The proxy node queries in its locally stored second hotspot list based on the extracted target index. If the target index is not in the second hot spot list, the proxy node sends the data query request of the client to the database, the database acquires the data value corresponding to the target index, the data value is fed back to the proxy node, and the proxy node feeds back the data value to the client.
In this embodiment, by performing periodic statistics by the proxy node, predicting the hot spot by the first cluster, monitoring the hot spot by the second cluster, and locally storing the hot spot list by the proxy node, when the client sends the query request, the method of directly querying from the second hot spot list, compared with the method of predicting and updating the hot spot by the server in the traditional method, storing the hot spot list in the terminal or the server locally, has the following several significant advantages:
(1) Server stress is relieved: in the conventional method, the server needs to process the request of the client and the tasks of hot spot prediction and update at the same time, which may lead to the shortage of server resources and performance degradation in a high concurrency scene. In the method using the proxy node, the tasks of hot spot prediction and updating are responsible for the first cluster and the second cluster, and the proxy node is only responsible for collecting data and inquiring the local hot spot list, so that the pressure of the server is greatly reduced.
(2) Network load is reduced: in the conventional method, each client needs to acquire the latest hot spot list from the server. In large distributed systems, this means a large amount of network data transfer and synchronization operations, which may lead to network congestion and performance degradation. By using the method of locally storing the hot spot list by the proxy node, the client can directly acquire hot spot data from the proxy node without frequent data interaction with the server, thereby reducing network load.
(3) The query efficiency is improved: when the client sends the query request, if the target index exists in the local hotspot list of the proxy node, the proxy node can directly return the corresponding data value without going to the query server. The query mode based on the local cache can remarkably improve the query efficiency and reduce the query delay.
(4) Better fault tolerance and reliability: in conventional approaches, if a server fails or a performance bottleneck occurs, the stability and availability of the overall system will be directly affected. In the method using the proxy node, because the tasks of hot spot prediction and updating are commonly born by a plurality of clusters, even if one cluster fails or has a performance bottleneck, the whole system is not greatly influenced. Meanwhile, the local cache of the proxy node can also ensure the fault tolerance and reliability of the system to a certain extent.
(5) The maintenance cost is reduced: for the conventional method of storing the hotspot list locally at the terminal, each terminal needs to maintain its own hotspot list and needs to synchronize with the server. This increases the complexity and cost of maintenance. The method of using the proxy node mainly concentrates maintenance work on the proxy node and the cluster, reduces maintenance cost, and realizes zero invasion and transformation of the client.
In one embodiment, the data query method further includes:
And if the access amount of the target hotspot index reaches the upper bandwidth limit, limiting the access of the target hotspot index.
Where access volume refers to the number or amount of accesses to a system, service, or resource (e.g., hotspot index) over a particular period of time (e.g., every second, minute, hour, or day). It is commonly used to measure busyness of a system or service and user activity.
The upper bandwidth limit refers to the maximum amount of data that a system or network can process or transmit in a particular period of time. For example, the upper bandwidth limit may be 80% of the physical machine network card bandwidth. For a server or network service, the upper bandwidth limit is typically related to hardware performance, network bandwidth, and quality of service (QoS) settings. When traffic exceeds the upper bandwidth limit, performance degradation, increased delay, or data loss may result.
Current limiting is a technical means for controlling the frequency of resource access. When the access volume reaches or exceeds a preset threshold, the throttling policy limits the processing speed or number of subsequent requests to prevent overload of the system. The current limiting may help the system maintain stability and optimize the user experience. The current limiting strategy may include: rejecting requests exceeding a threshold; delaying processing requests that exceed a threshold; and queuing the request exceeding the threshold value, and processing after waiting for bandwidth recovery. Appropriate throttling algorithms and policies may be selected based on traffic demands.
Specifically, the first cluster calculates the access frequency and the data including the size of the target hotspot index in real time, analyzes whether the access frequency and the data including the size reach the upper bandwidth limit, if so, feeds back the current limiting indication message to all proxy nodes, triggers the current limiting mechanism, and limits the access of the target hotspot index according to a preset current limiting strategy.
In some embodiments, fig. 3 is an overall architecture diagram of a data query system in one embodiment, as shown in fig. 3, the proxy node is mainly used as a proxy layer, and is responsible for routing a request of a Client to a back-end appropriate storage node, and implementing zero intrusion to an application Client (APP-Client) and a cache server, and is mainly composed of three modules:
and a hot spot maintenance module: and performing operations such as local caching, updating, invalidation and the like on the key and value values of the hot key, and maintaining hot key data in real time.
Hot spot current limiting module: and the hot key is limited, so that systematic risks such as network bandwidth congestion and the like caused by the application of high-frequency access to the large key are prevented.
And a communication module: and monitoring the discovery, invalidation, updating and other operations of the hot key, and informing each agent layer to maintain the hot key.
In this embodiment, when the access amount of the target hotspot index reaches the upper limit of the bandwidth, the access of the target hotspot index is limited, so that the request exceeding the threshold value can be limited, and the access pressure of the underlying database is reduced, thereby ensuring the system availability and service stability of the underlying database.
In one embodiment, the data query method further includes:
If the updating operation for the first hot spot list is generated, updating the second hot spot list according to the updated first hot spot list, so that the updated second hot spot list is consistent with the updated first hot spot list.
Wherein, the update operation refers to the modification of the first hot spot list. For example, the modification may include adding new hotspots, deleting old hotspots, modifying attributes or order of hotspots, and so forth.
In some embodiments, the update operations include a hotspot index change operation, an expiration operation, and a delete operation.
Wherein, the hotspot index change operation refers to the action of modifying or replacing an index considered as a "hotspot". The change operation may include modifying the value, attribute, structure of the index, or moving the index from one location to another.
An expiration operation refers to a process of marking a certain index or data value as expired or no longer valid. In some systems, the data values corresponding to the indices may be time-efficient, i.e., they may no longer be needed or valid after a period of time. The expiration operation may clear data that is no longer needed, free up storage space, and ensure that the system only processes valid data. This is typically accomplished by setting a time stamp or validity period, which triggers an expiration operation when the data item exceeds this time limit.
A delete operation refers to the act of removing a certain index or data item. This is typically because the data value corresponding to the index is no longer needed, is no longer relevant, or has been replaced by another data item. The deleting operation can release the storage space, improve the system performance and ensure the accuracy and consistency of the data.
Specifically, if the client calls the update operations such as del (delete) and update (update) to cause the corresponding value of the current hotspot key to be invalid, at this time, the second cluster monitors the operations such as new and invalid of the hotspot key, updates the first hotspot list, and notifies all proxy nodes of the proxy cluster in real time to update and invalidate the local cache, and the proxy node monitors that the first hotspot list is updated, and updates the second hotspot list according to the monitored updated first hotspot list, so that the updated second hotspot list is consistent with the updated first hotspot list.
In this embodiment, when the value of the hotspot key is changed (e.g. deleted or updated), by updating the first hotspot list in real time and notifying the proxy node to update or disable the locally cached second hotspot list, it is ensured that all the proxy nodes access the hotspot data based on the latest data state, which is helpful for reducing errors and confusion caused by data inconsistency; by concentrating the management logic of the hotspot data in the second cluster, the complexity of the whole system can be reduced, and the client and the proxy node do not need to care about the specific storage position and the updating strategy of the hotspot data, but only need to care about own business logic. At the same time, the method is convenient for unified management and optimization of the hot spot data.
In one embodiment, the data query method further includes:
and pushing the second hot spot list to other proxy nodes in the proxy cluster so as to enable the second hot spot list locally stored by all proxy nodes in the proxy cluster to be consistent.
If the other proxy node does not update the locally stored second hotspot list, after the application request deleting the hotspot index key appears, if the next application request reads the hotspot index key and the request is routed to the other proxy node, the other proxy node directly returns the data value corresponding to the hotspot index key, and the value of the actual hotspot index key is deleted, which is definitely not right, so that the local cache data between the proxy nodes must be required to ensure consistency.
Specifically, when a change occurs to the second hotspot list (e.g., a new hotspot data item is added, or an old hotspot data item is removed or updated), the current proxy node "pushes" the changed information to all other proxy nodes in the proxy cluster.
In this embodiment, the second hotspot list is pushed to other proxy nodes in the proxy cluster, so that the second hotspot lists locally stored by all proxy nodes in the proxy cluster are kept consistent, each proxy node is ensured to have the latest and consistent copy of the second hotspot list, and various errors and anomalies possibly caused by inconsistent data, such as non-existing data, repeated data, and the like, are avoided. Meanwhile, by pushing the updated second hot spot list, the occurrence of errors and anomalies can be reduced, and the stability and usability of the system are improved
In a detailed embodiment, a data query method includes the steps of:
1. The agent node periodically counts the access frequency of each index in a preset time period, and sends the access frequency to the first cluster to instruct the first cluster to predict the hot spot index according to the access frequency fed back by each agent node, and registers the hot spot index and the corresponding data value to the second cluster to form a first hot spot list.
2. The proxy node monitors a first hot spot list in the second cluster, and caches the first hot spot list into a local storage space to form a second hot spot list.
3. And under the condition that the proxy node acquires the data query request sent by the client, querying in the second hot spot list based on the target index carried in the data query request.
4. If the access amount of the target hotspot index reaches the upper bandwidth limit, executing the fifth step; and if the access amount of the target hotspot index does not reach the upper bandwidth limit, executing the step six.
5. The proxy node throttles access to the target hotspot index.
6. If the target index is in the second hot spot list, executing a step seven; if the target index is not in the second hot spot list, step eight is performed.
7. The agent node feeds back a target data value corresponding to the target index stored in the local storage space to the client; step nine is performed.
8. The agent node reads a target data value corresponding to the target index from the database and feeds the target data value back to the client; step nine is performed.
9. If the updating operation aiming at the first hot spot list is generated, the proxy node updates the second hot spot list according to the updated first hot spot list so that the updated second hot spot list is consistent with the updated first hot spot list; the updating operation comprises a hotspot index changing operation, an expiration operation and a deleting operation; step ten is performed.
10. The proxy node pushes the second hot spot list to other proxy nodes in the proxy cluster, so that the second hot spot list locally stored by all proxy nodes in the proxy cluster is kept consistent.
In some embodiments, fig. 4 is a flowchart of a data query method in another embodiment, as shown in fig. 4, an application call client API performs a read-write request on a key and a value, after receiving the request, a Proxy node periodically (every 10 s) performs statistical analysis on access frequencies of the requested key, and sends the request to a first cluster, and nodes of the first cluster summarize, calculate and analyze access frequencies of each key sent by all Proxy nodes in real time, predict hot-spot keys, and register the key and the value of the hot key to a second cluster to obtain a first hot-spot list. After the hot keys and the values are registered in the second cluster, all Proxy nodes can monitor the pushed hot key and the pushed value data through a monitoring mechanism of the communication module and cache the pushed hot key and the pushed value data in a local cache space of the Proxy nodes to obtain a second hot list. When an application client initiates a request to access a Proxy node, the Proxy node firstly judges whether the key is in a hot second hot spot list, if so, the value of a local cache of the hot key is directly obtained from a hot spot maintenance module without accessing a database. If the key is not hot, the request is directly forwarded to access the database for acquisition.
If the application client calls the update operations such as del (delete), update, etc., the corresponding value of the current hotspot key will be invalid. When receiving a processing request, the Proxy node deletes the local cache of the hot key of the hot module if the hot key is found to have the above interface call condition of invalid value, so as to achieve the consistency of the local cache and the database cache data. Meanwhile, the Proxy node pushes the value failure event to other Proxy nodes of the Proxy cluster through the communication module so as to ensure the consistency of all node hot keys of the whole Proxy cluster.
If the concurrent access quantity of the hot key is extremely high and even reaches the upper limit of the bandwidth of the physical machine network card, the hot key must be limited at the moment to prevent network traffic congestion caused by the hot key. And the first cluster calculates the access frequency and the data packet size of the hot key in real time, analyzes whether the access frequency and the data packet size reach 80% of the bandwidth of the physical machine network card, and if so, informs all Proxy nodes to limit the flow through the communication module.
In this embodiment, by adding Proxy clusters, periodic statistics of access key frequency of application requests is realized, and real-time summarization analysis and calculation of hot keys are realized through a first cluster, so as to predict the hot keys, and local caching and current limiting are performed on the hot keys in the Proxy clusters, so that access pressure of the bottom layer cache clusters is reduced, and system availability and service stability of the bottom layer cache clusters are ensured. Meanwhile, the second cluster monitors and broadcasts the update and deletion events of the hot keys and the values, so that the data consistency of the local cache and the cache clusters is ensured, and the problem that the hot data access causes cache avalanche under the high concurrency scenes such as 'commodity second killing', 'marketing popularization' and the like of the distributed cache is solved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data query device for realizing the above related data query method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more data query devices provided below may refer to the limitation of the data query method hereinabove, and will not be repeated herein.
In an exemplary embodiment, as shown in fig. 5, there is provided a data query apparatus, including:
the hotspot discovery module 501 is configured to periodically count access frequencies of indexes in a preset period of time, send the access frequencies to a first cluster, instruct the first cluster to predict a hotspot index according to the access frequencies fed back by each proxy node, and register the hotspot index and a corresponding data value to a second cluster to form a first hotspot list;
A hotspot monitoring module 502, configured to monitor the first hotspot list in the second cluster, and cache the first hotspot list into a local storage space to form a second hotspot list;
A local query module 503, configured to query in the second hotspot list based on a target index carried in a data query request sent by a client when the data query request is obtained;
And a hotspot feedback module 504, configured to, if the target index is in the second hotspot list, feed back a target data value corresponding to the target index stored in the local storage space to the client.
In one embodiment, the hotspot feedback module 504 is further configured to, if the target index is not in the second hotspot list, read a target data value corresponding to the target index from a database, and feed back the target data value to the client.
In one embodiment, the hotspot feedback module 504 is further configured to limit the access of the target hotspot index if the access amount of the target hotspot index reaches the upper bandwidth limit.
In one embodiment, the hotspot feedback module 504 is further configured to update the second hotspot list according to the updated first hotspot list if an update operation for the first hotspot list is generated, so that the updated second hotspot list is consistent with the updated first hotspot list.
In one embodiment, the update operations include a hotspot index change operation, an expiration operation, and a delete operation.
In one embodiment, the hotspot feedback module 504 is further configured to push the second hotspot list to other proxy nodes in the proxy cluster, so that the second hotspot list locally stored by all proxy nodes in the proxy cluster remains consistent.
The various modules in the data querying device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing indexed access frequency data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data query method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile memory and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (RESISTIVE RANDOM ACCESS MEMORY, reRAM), magneto-resistive Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computation, an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) processor, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the present application.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.
Claims (10)
1. A data query method, wherein the method is applied to any proxy node in a proxy cluster, the method comprising:
Periodically counting the access frequency of each index in a preset time period, sending the access frequency to a first cluster, indicating the first cluster to predict a hot spot index according to the access frequency fed back by each proxy node, and registering the hot spot index and a corresponding data value to a second cluster to form a first hot spot list;
monitoring the first hot spot list in the second cluster, and caching the first hot spot list into a local storage space to form a second hot spot list;
Under the condition that a data query request sent by a client is obtained, querying in the second hot spot list based on a target index carried in the data query request;
And if the target index is in the second hot spot list, feeding back a target data value corresponding to the target index stored in the local storage space to the client.
2. The method according to claim 1, wherein the method further comprises:
And if the target index is not in the second hot spot list, reading a target data value corresponding to the target index from a database, and feeding back the target data value to the client.
3. The method according to claim 1, wherein the method further comprises:
and if the access amount of the target hotspot index reaches the upper bandwidth limit, limiting the access of the target hotspot index.
4. The method according to claim 1, wherein the method further comprises:
If the updating operation for the first hot spot list is generated, the second hot spot list is updated according to the updated first hot spot list, so that the updated second hot spot list is consistent with the updated first hot spot list.
5. The method of claim 4, wherein the update operation comprises a hotspot index change operation, an expiration operation, and a delete operation.
6. The method according to any one of claims 1 to 5, further comprising:
and pushing the second hot spot list to other proxy nodes in the proxy cluster so as to enable the second hot spot list locally stored by all proxy nodes in the proxy cluster to be consistent.
7. A data querying device, the device comprising:
The hot spot discovery module is used for periodically counting the access frequency of each index in a preset time period, sending the access frequency to the first cluster, indicating the first cluster to predict the hot spot index according to the access frequency fed back by each proxy node, and registering the hot spot index and the corresponding data value into the second cluster to form a first hot spot list;
the hot spot monitoring module is used for monitoring the first hot spot list in the second cluster, and caching the first hot spot list into a local storage space to form a second hot spot list;
The local query module is used for querying in the second hot spot list based on a target index carried in the data query request under the condition that the data query request sent by the client is obtained;
And the hotspot feedback module is used for feeding back the target data value corresponding to the target index stored in the local storage space to the client if the target index is in the second hotspot list.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410638753.9A CN118503264A (en) | 2024-05-22 | 2024-05-22 | Data query method, apparatus, computer device, readable storage medium, and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410638753.9A CN118503264A (en) | 2024-05-22 | 2024-05-22 | Data query method, apparatus, computer device, readable storage medium, and program product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118503264A true CN118503264A (en) | 2024-08-16 |
Family
ID=92228829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410638753.9A Pending CN118503264A (en) | 2024-05-22 | 2024-05-22 | Data query method, apparatus, computer device, readable storage medium, and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118503264A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119271705A (en) * | 2024-12-10 | 2025-01-07 | 阿里云计算有限公司 | Data aggregation method, distributed system, computing device and readable storage medium |
-
2024
- 2024-05-22 CN CN202410638753.9A patent/CN118503264A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119271705A (en) * | 2024-12-10 | 2025-01-07 | 阿里云计算有限公司 | Data aggregation method, distributed system, computing device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10642840B1 (en) | Filtered hash table generation for performing hash joins | |
US8065365B2 (en) | Grouping event notifications in a database system | |
US7457835B2 (en) | Movement of data in a distributed database system to a storage location closest to a center of activity for the data | |
US8762369B2 (en) | Optimized data stream management system | |
US10649903B2 (en) | Modifying provisioned throughput capacity for data stores according to cache performance | |
JP7270755B2 (en) | Metadata routing in distributed systems | |
CN111787055A (en) | A Redis-based, transaction-oriented and multi-data center data distribution method and system | |
CN112084206A (en) | Database transaction request processing method, related device and storage medium | |
CN110837423A (en) | Method and device for automatically acquiring data of guided transport vehicle | |
CN118503264A (en) | Data query method, apparatus, computer device, readable storage medium, and program product | |
CN117539915B (en) | Data processing method and related device | |
CN119201770A (en) | Data access method and device based on last-level cache | |
Sourlas et al. | Caching in content-based publish/subscribe systems | |
WO2020094064A1 (en) | Performance optimization method, device, apparatus, and computer readable storage medium | |
Guo et al. | Blockchain-assisted caching optimization and data storage methods in edge environment | |
US11741096B1 (en) | Granular performance analysis for database queries | |
JP6406254B2 (en) | Storage device, data access method, and data access program | |
CN112395453A (en) | Self-adaptive distributed remote sensing image caching and retrieval method | |
JP7458610B2 (en) | Database system and query execution method | |
CN114205368B (en) | Data storage system, control method, control device, electronic equipment and storage medium | |
CN117896275A (en) | Link tracking method and device, equipment, service node, storage medium and system | |
Huang et al. | Ceds: Center-edge collaborative data service for mobile iot data management | |
Lin et al. | Aggregate computation over data streams | |
CN119011529B (en) | Virtual machine neighbor discovery method and device in cluster environment | |
US12340104B1 (en) | Universal storage handler |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |