[go: up one dir, main page]

CN112650755B - Data storage method, method for querying data, database, and readable medium - Google Patents

Data storage method, method for querying data, database, and readable medium Download PDF

Info

Publication number
CN112650755B
CN112650755B CN202011563163.2A CN202011563163A CN112650755B CN 112650755 B CN112650755 B CN 112650755B CN 202011563163 A CN202011563163 A CN 202011563163A CN 112650755 B CN112650755 B CN 112650755B
Authority
CN
China
Prior art keywords
data
queried
stored
edge node
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011563163.2A
Other languages
Chinese (zh)
Other versions
CN112650755A (en
Inventor
黄松
沈达宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011563163.2A priority Critical patent/CN112650755B/en
Publication of CN112650755A publication Critical patent/CN112650755A/en
Application granted granted Critical
Publication of CN112650755B publication Critical patent/CN112650755B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data storage method, a data query method, a database and a readable medium, and relates to the technical field of computers, in particular to the technical field of cloud computing and edge computing. The data storage method may be performed at an edge node, comprising: receiving data to be stored; caching the received data; compressing the data with the same time sequence in the cached data; the compressed data is written, wherein the written data includes a timestamp associated with the received data. By using the data storage method and the data query method provided by the present disclosure, time series data can be efficiently stored and queried at the edge data.

Description

Data storage method, method for querying data, database, and readable medium
Technical Field
The disclosure relates to the technical field of computers, in particular to the technical fields of edge computing and cloud computing, and specifically relates to a data storage method, a data query device and a readable medium.
Background
With the development of the Internet of things and chip technology, the processing capacity of the edge equipment is stronger and stronger, so that the business processes processed by the edge equipment are more and more. In the related art, the main functions of the edge computing platform are concentrated on aspects of equipment management, cloud edge coordination, application issuing and the like.
Disclosure of Invention
According to an aspect of the disclosed embodiments, there is provided a data storage method performed at an edge node, comprising: receiving data to be stored; caching the received data; compressing the data with the same time sequence in the cached data; the compressed data is written, wherein the written data includes a timestamp associated with the received data.
According to another aspect of the present disclosure, there is also provided a method for querying data, including: generating a query request, wherein the query request includes a time range of data to be queried; parsing the query request to determine edge nodes associated with data to be queried; the data to be queried is obtained based on the information of the associated edge node and the time range.
According to another aspect of the present disclosure, there is also provided a data storage device executed at an edge node, comprising: a receiving unit configured to receive data to be stored; a caching unit configured to cache the data to be stored; a compression unit configured to compress data of the same time series among the buffered data; and a time-series data storage unit configured to write the compressed data, wherein the written data includes a time stamp associated with the written data.
According to another aspect of the present disclosure, there is also provided an apparatus for querying data, including: a query request generation unit configured to generate a query request, wherein the query request includes a time range of data to be queried; an parsing unit configured to parse the query request to determine an edge node associated with the data to be queried; and a query data acquisition unit configured to acquire the data to be queried based on the information of the associated edge node and the time range.
According to another aspect of the present disclosure, there is also provided a database including: a log storage unit configured to store data to be stored in the form of a log; a caching unit configured to cache the data to be stored; a data storage unit comprising: a meta information storage subunit configured to store meta information associated with data to be stored; an index information storage subunit configured to store index information associated with data to be stored; and a time-series data storage unit configured to store a key-value pair composed of the data to be stored and a time stamp associated with the data to be stored.
According to another aspect of the present disclosure, there is also provided a computer apparatus including: a memory, a processor and a computer program stored on the memory, wherein the processor is configured to execute the computer program to implement the steps of the method as described above.
According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method as described above.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the method as described above.
By means of the scheme of the embodiment of the disclosure, time sequence data acquired by the edge device can be stored at the edge device, and data transmission pressure and network requirements between the edge device and the cloud are reduced. In addition, by storing the time sequence data at the edge equipment, the stored time sequence data can be obtained without accessing the cloud end in some cases, and the data processing pressure of the cloud end is further reduced.
Drawings
The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. In the drawings, wherein like reference numerals refer to like, but not necessarily identical, elements throughout:
FIG. 1 is a schematic diagram of an exemplary system in which various methods and apparatus described herein may be implemented, according to some exemplary embodiments of the present disclosure.
FIG. 2 shows a schematic flow chart of a data storage method performed at an edge node according to an embodiment of the disclosure;
FIG. 3 shows another schematic flow chart of a data storage method performed at an edge node according to an embodiment of the disclosure;
FIG. 4 illustrates a flow chart of a process for querying data according to an embodiment of the present disclosure;
FIG. 5 shows a schematic flow of a data query method of edge cloud collaboration according to an embodiment of the present disclosure;
FIG. 6 shows a schematic block diagram of a data storage device at an edge node according to an embodiment of the present disclosure;
FIG. 7 shows a schematic block diagram of an apparatus for querying data in accordance with an embodiment of the present disclosure;
FIG. 8 shows a schematic block diagram of a database according to an embodiment of the present disclosure;
FIG. 9 shows another schematic block diagram of a database according to an embodiment of the present disclosure; and
Fig. 10 is a schematic block diagram of an example computing device according to an example embodiment of the disclosure.
Detailed Description
In order to better understand the present disclosure, a technical solution in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present disclosure. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
The terms first and second and the like in the description and in the claims of the present disclosure and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Cloud computing (cloud computing) refers to a technical system that accesses an elastically extensible shared physical or virtual resource pool through a network, wherein resources can include servers, operating systems, networks, software, applications, storage devices and the like, and can be deployed and managed in an on-demand and self-service manner. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training. In a core network of a data center, cloud computing collects data of a terminal through layer-by-layer network equipment, and performs big data analysis by virtue of strong storage and computing capacity.
In contrast to cloud computing, edge computing is the provision of cloud services and IT environment services for application developers and service providers on the edge side of a network, with the goal of providing computing, storage, and network bandwidth in close proximity to data input or users. The edge calculation can analyze real-time and short-period data, can more efficiently process and execute local data in real time and can relieve the data flow and the cloud workload in the network.
The time sequence data storage has the advantages of high-efficiency reading and writing, high compression ratio storage and the like. Aiming at the data acquisition scene of the equipment of the Internet of things, the problems of high storage cost, low writing and query analysis efficiency and the like caused by huge equipment acquisition point number and high data acquisition frequency are solved by time sequence data storage. By analyzing the time series formed by the time series data, the statistical property and the development regularity of the time series in the sample can be found out. Therefore, the time sequence data in the field of the Internet of things has important application value.
With the development of the internet of things and chip technology, the processing capability of devices (such as edge devices) is stronger, and more things are put into the edge devices for processing. However, in the related art, the edge device needs to transmit the time series data acquired and generated at the edge device to the server device as the cloud end, and store in the time series database as the server of the cloud end. This requires that the edge device and the cloud server remain connected over a network and that the time series data at the edge device is written directly into the cloud time series database over the network connection. However, since the network connection at the edge device cannot be kept stable all the time, the server in the cloud may also malfunction, and thus, there is a risk that the data to be written into the cloud by the edge device is lost.
Once the network connection between the edge device and the cloud or the server itself of the cloud fails, the risk of data write failure needs to be reduced by adding retry or cache logic in the code. Although this approach can reduce the risk of write failure to some extent, it is also unavoidable that data is lost if a network failure occurs for a longer period of time.
In addition, the data may also be stored by building a database on the edge device. However, due to limited storage and computing capabilities of the edge devices, it is difficult to efficiently store and query large amounts of time series data at the edge devices.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. As shown in fig. 1, the system 100 includes a user terminal 101, a server 102, and an edge device 103.
For example, the server 102 may provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to the user terminal 101 under a software as a service (SaaS) model. In some examples, server 102 may be an edge computing system cloud.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 101. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. The user of user terminal 101 may in turn utilize one or more client applications to interact with server 102 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The user can use the user terminal 101 to upload and manage application modules, and the like. The user terminal 101 may provide an interface that enables a user of the user terminal 101 to interact with the client device. The user terminal 101 may also output information to the user via the interface.
By way of example, the user terminal 101 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, apple iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., google Chrome OS); or include various mobile operating systems such as Microsoft Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.
By way of example, the server 102 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.
Illustratively, the computing units in server 102 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 102 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.
For example, the server 102 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the user terminals 101. Server 102 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of user terminal 101.
Illustratively, edge device 103 is a device that provides an entry point to an enterprise or service provider core network, which may be, for example, a router, a routing switch, an Integrated Access Device (IAD), a multiplexer, and various Metropolitan Area Network (MAN) and Wide Area Network (WAN) access devices. In other examples, edge devices 103 may also include, but are not limited to, for example, intelligent routers, intelligent speakers, network Attached Storage (NAS), webcams, storable network devices, smart watches, smart televisions, monitors, and the like.
It will be appreciated that the user terminal 101, the server 102 and the edge device 103 may communicate over a network. The network may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
Fig. 2 shows a schematic flow chart of a data storage method performed at an edge node according to an embodiment of the present disclosure. With the method shown in fig. 2, the time series data generated at the edge node can be stored in the form of a time series database. In some embodiments, the data stored in the utilization method 200 may be data generated or collected by a single edge node. In other embodiments, the data stored in method 200 may be utilized for data generated by a plurality of edge nodes within a distance.
In step S202, data to be stored may be received. The received data to be stored may be time series data. Wherein the time series data may be information varying depending on time, which may be used to reflect the data over time. For example, the time series data may be monitoring data for certain information.
In some embodiments, the received data to be stored may be monitoring data generated at or collected by an edge device that is an edge node. For example, the edge device may be an in-vehicle electronic system, and the data to be stored may be various vehicle travel data (such as a travel speed, acceleration, travel route, etc. of the vehicle) collected by the in-vehicle electronic system through a sensor. For another example, the edge device may be a wearable device, and the data to be stored may be various physiological information of the human body (such as blood pressure, body temperature, heart beat, etc. of the human body) collected by the wearable device through the sensor.
In other embodiments, the data to be stored is sent by the other edge nodes to the current edge node for storage.
In step S204, the received data may be buffered.
In some embodiments, the data received by the edge node may be buffered in the memory, and the buffered data is accumulated to a predetermined data amount and then written to the disk. Since more resources are consumed for writing to the disk than for writing to the memory, the number of times of writing to the disk is reduced in the above manner, and the writing performance of the edge device can be improved.
Furthermore, by caching the received data in memory, data having the same properties can be brought together. For example, data received at the same time may be aggregated together, or data from the same object may be aggregated together.
In step S206, the data of the same time series in the buffered data may be compressed.
In some embodiments, the data of the same time sequence may be data from the same object or data having the same properties. When querying data, data analysis processing can be provided for the data of the same time sequence to obtain data analysis results aiming at a certain object or a certain characteristic. For example, statistics such as an average value and a sum of data of the same time series can be obtained by data analysis processing.
In some implementations, the data may be compressed using various efficient coding schemes. For example, data compression may be achieved using predictive coding, transform coding, vector quantization coding, subband coding, neural network coding, and the like.
Compression of the data may be enhanced by merging together the data of the same time series, thereby making the data more suitable for storage at edge nodes where both storage and computing capabilities are limited.
In step S208, the compressed data may be written, wherein the written data may include a timestamp associated with the received data.
In some embodiments, the time stamp associated with the data received in step S202 may be stored in association with the value of the data.
By using the data storage method executed at the edge node, the time sequence data can be stored at the edge node, so that the problem that the data at the edge node cannot be synchronized to the cloud end when network connection or the cloud end fails can be solved. By caching the data in the memory, the read-write operation at the edge node can be reduced, and the write-in performance at the edge node can be improved. By compressing the data of the same time sequence, the storage capacity at the edge node can be improved.
Fig. 3 shows another schematic flow chart of a data storage method performed at an edge node according to an embodiment of the present disclosure.
As shown in fig. 3, in step S302, time series data may be written. The received data to be stored may be processed using steps S202 to S208 described in connection with fig. 2, and the processed time series data written to the disk at the edge node.
Various processes can be performed on the devices stored at the edge node using S304 to S308. Although various operations performed on written time series data are shown in the form of a flowchart 300 in fig. 3, in fact, one skilled in the art may perform the steps shown in steps S304 to S308 in a different order according to actual situations, or repeat or omit at least one of steps S304, S306, and S308 according to actual situations.
In some embodiments, the written time series data may include a key value pair of a time stamp associated with the data to be stored and a numerical value of the data. The key value pair can be used for efficiently storing time sequence data and facilitating later inquiry of the time sequence data.
In some embodiments, the written data may also include meta information and index information associated with the received data. Wherein the meta information can be used for fast screening of the data to be queried. For example, the query meta-information may be used to quickly determine whether data having the same meta-information as the data to be queried is stored in the edge node. If it is determined that the meta information of the data stored in the edge node is different from the meta information of the data to be queried, the step of reading the data stored on the disk may be omitted. The index information may be used to quickly locate, at the time of the query, data of the same nature as the data to be queried in the stored data. Among them, meta information may include Metric (Metric) information and Field (Field) information of received data. The metric information in the time series database is similar to a relational database Table (Table), and the domain information may be used to represent field information representing measured values of data. The index information may include identification information associated with the received data. For example, the index information may be used to represent Identification (ID) information of a device for collecting or generating the received data. In some examples, sequential data having the same index information may be sequentially stored and compressed by various efficient data encoding schemes.
In step S304, the received data may be synchronized to the cloud. By synchronizing the received data to the cloud, it is possible to reduce the size of the data amount of the data stored at the edge node and to relieve the storage pressure at the edge node. Further, a synchronization rate to synchronize the received data to the cloud may be determined based on the usage information of the edge node. By performing synchronization during periods when the edge node operating pressure is small, the data read-write pressure at the edge node can be reduced.
In some embodiments, the synchronized data is in the form of a log. In some implementations, in addition to storing data received at the edge node as time series data according to the method described in connection with fig. 2, the data received at the edge node may also be written to disk in the form of a log. For example, the received data may be written to disk based on a pre-written log (WAL) technique. Such log-form data retains the original information of the received data. Since the log-form data is not subjected to complicated encoding and compression processes, various information (e.g., meta information, index information, etc.) associated with the data can be easily extracted from the log-form data.
Furthermore, data written to disk in log form may also be used as a backup of time-series data stored at the edge node. If a device at an edge node encounters a failure and needs to be restarted, it may happen that data in the cache is lost, and data that has not yet been compression encoded is lost. In this case, the data can be restored from the log.
In some examples, the synchronization rate may be set to a lower value during busy periods of the edge node and to a higher value during idle periods of the edge node.
As previously described, in embodiments provided by the present disclosure, devices accepted by an edge node may be stored at the edge node in the form of time series data without being transferred to the cloud for storage in real time. Therefore, the synchronization rate of the data can be set according to the usage information of the edge node to reduce the read-write pressure of the edge node. For example, the synchronization rate may be set to a lower value during the daytime period and to a higher value during the night time period.
In some implementations, log data may be sent to the cloud by streaming. The cloud can analyze the log data and write the data into a time sequence database of the cloud. The timing database of the cloud may be any timing database, such as InfluxDB, openTSDB.
In step S306, the data before a predetermined time in the written data is cleared by an expiration mechanism. By reducing the amount of data stored at the edge node, the storage pressure at the edge node may be reduced.
Because the storage capacity of the equipment at the edge node is limited, massive data cannot be stored like a cloud time sequence database, and the capacity expansion capacity is generally not available, so that part of the data stored at the edge node can be cleared through a preset expiration mechanism.
In some embodiments, the preset expiration mechanism may include clearing data prior to a predetermined time period. For example, portions of the data stored at the edge nodes (e.g., data stored a week ago) may be purged daily, every three days, or weekly (or any other predefined period of time).
In other embodiments, the preset expiration mechanism may include performing a purge of data prior to the predetermined time when the amount of stored data exceeds a predetermined data amount threshold.
In step S308, the method 300 may further include: and determining a plurality of files smaller than a predetermined file size included in the written data, and merging the plurality of files smaller than the predetermined file size. By combining a plurality of small files into one large file, the number of storage files traversed during inquiry can be reduced, and the inquiry efficiency is improved.
In some cases, if less data is received at the edge node, it may be the case that the data is written to disk in the form of multiple smaller-sized files. In this case, since the number of files is large, the number of files that need to be traversed when performing a data query is also large, thereby making the query efficiency low. In order to improve efficiency of querying data, when storing data, a plurality of files smaller than a predetermined file size may be combined into one file to reduce the number of stored files.
Fig. 4 shows a flowchart of a process for querying data according to an embodiment of the present disclosure. The data query flow 400 shown in fig. 4 may be implemented using an edge device that is an edge node.
As shown in fig. 4, in step S402, a query request may be generated, wherein the query request includes a time range of data to be queried.
In some embodiments, the query request may include identification information associated with the data to be queried. In some implementations, the query request may be used to query for a value of data within a predetermined period of time. In other implementations, the query request may be used to query numerical processing results (e.g., sum, average, standard deviation, etc.) for data within a predetermined period of time.
In some embodiments, the query request may be generated in response to an input by a user or in response to a request from a user terminal. Parameters of the query request, such as a time frame associated with the query request, identification information, etc., may be determined in response to content entered by a user or a request from a user terminal. In other embodiments, the query request may be generated based on a pre-set rule. For example, a command for generating a query request may be written in an application installed in an edge device or a user terminal to generate a query request for data at a predetermined timing or in response to the occurrence of a preset event.
In step S404, the query request may be parsed to determine edge nodes associated with the data to be queried.
In some embodiments, the edge node may obtain parameters of the query request, such as a time range associated with the query request, identification information, and/or numerical processing results, etc., by parsing the query request.
In some implementations, an edge node associated with data to query can be determined based on identification information associated with a query request. In some examples, the identification information associated with the query request indicates that the data to be queried is collected or generated by the device indicated by the identification information. Thus, the data to be queried may be stored in the disk of the device indicated by the identification information.
In step S406, data to be queried may be obtained based on the information and time range of the associated edge node.
In response to determining that the edge node associated with the data to be queried includes at least two edge nodes, cloud ends corresponding to the determined at least two edge nodes are accessed to obtain the data to be queried. Since the data to be queried involves at least two edge nodes, it is difficult to obtain the data to be queried by accessing a database at a single edge node. Because the data at the edge nodes are synchronized to the cloud database for storage, in this case, the data to be queried may be obtained by accessing the cloud corresponding to the determined at least two edge nodes.
And in response to determining that the time range exceeds a preset time range threshold, accessing the cloud end corresponding to the edge node to obtain data to be queried. As previously described, due to the limited storage capacity of the edge node, data stored in the edge node prior to a predetermined time may be purged based on an expiration mechanism. Thus, if the time range associated with the query request exceeds a preset time range threshold, the data to be queried may have been cleared in the edge node. In this case, therefore, the data to be queried can be obtained by accessing the cloud corresponding to the edge node.
In response to determining that the edge node associated with the data to be queried includes a single edge node and the time range does not exceed a preset time range threshold, the single edge node is accessed to obtain the data to be queried. As previously described, in the event that the data to be queried is associated with a single edge node and the time range does not exceed a preset time range threshold, the data to be queried may be acquired at the associated edge node.
In the case where the data to be queried is stored at the edge node, the data to be queried is obtained by accessing the edge node without accessing the cloud server. Because the network delay of the user for accessing the edge node is smaller than that of the cloud server, better query performance can be achieved.
Fig. 5 shows a schematic flow of a data query method of edge cloud collaboration according to an embodiment of the present disclosure. In some embodiments, the steps shown in fig. 5 may be performed by an edge node.
As shown in fig. 5, at step S501, the query starts.
At step S502, it may be determined by parsing the query request whether the query request originates from a temporal database at the cloud or from an edge temporal database at the edge node.
In the case where it is determined that the query request is initiated to the cloud-side time-series database, the method may proceed to step S503 to initiate a query to the cloud-side time-series database.
In the event that it is determined that the query request is not initiated to the temporal database of the cloud, the method may proceed to step S504 to initiate a query to the edge temporal database.
In step S505, it may be determined whether the data to be queried by the query request relates to a plurality of edge nodes.
In the case of determining whether the data to be queried involves multiple edge nodes, the method may proceed to step S503 to initiate a query to a temporal database of the cloud.
In case it is determined that the data to be queried relates to a plurality of edge nodes, the method may proceed to step S506.
In step S506, it may be determined whether the time range to be queried by the query request exceeds the data expiration time of the edge node.
In the case that it is determined that the time range to be queried exceeds the data expiration time of the edge node, the method may proceed to step S503 to initiate a query to the time-series database of the cloud.
In case it is determined that the time range to be queried does not exceed the data expiration time of the edge node, the method may proceed to step S507.
In step S507, it may be determined whether the data to be queried is stored at an edge node of the currently executing method.
In the event that it is determined that the data to be queried is stored at the current node, the method may proceed to step S508 to query the time-series data storage engine of the current node to obtain a query result.
In case it is determined that the data to be queried is stored at the other edge node, the method may proceed to step S509 to initiate a query to another edge node indicated in the query request.
In step S510, a query result may be returned to the calling end. For example, the query result may be sent to an output device of the user terminal or the edge node.
Fig. 6 shows a schematic block diagram of a data storage device at an edge node according to an embodiment of the present disclosure. As shown in fig. 6, the data storage device 600 may include a receiving unit 610, a buffering unit 620, a compressing unit 630, and a time-sequential data storage unit 640.
The receiving unit 610 may be configured to receive data to be stored. The buffering unit 620 may be configured to buffer the received data. The compression unit 630 may be configured to compress the same time series of data in the buffered data. The time-series data storage unit 640 may be configured to write compressed data, wherein the written data includes a time stamp associated with the received data.
The operations of the units 610 to 640 of the data storage device 600 are similar to those of the steps S202 to S208 described above, and are not repeated here.
By using the data storage device executed at the edge node, the time sequence data can be stored at the edge node, so that the problem that the data at the edge node cannot be synchronized to the cloud end when network connection or the cloud end fails can be solved. By caching the data in the memory, the read-write operation at the edge node can be reduced, and the write-in performance at the edge node can be improved. By compressing the data of the same time sequence, the storage capacity at the edge node can be improved.
Fig. 7 shows a schematic block diagram of an apparatus for querying data according to an embodiment of the disclosure. As shown in fig. 7, the apparatus 700 may include.
The query request generation unit 710 may be configured to generate a query request, wherein the query request comprises a time range of data to be queried. The parsing unit 720 may be configured to parse the query request to determine edge nodes associated with the data to be queried. The query data acquisition unit 730 may be configured to obtain data to be queried based on information and time ranges of the associated edge nodes.
The operation of the units 710 to 730 of the apparatus 700 is similar to the operation of the steps S402 to S406 described above, respectively, and will not be described again.
Fig. 8 shows a schematic block diagram of a database according to an embodiment of the present disclosure. The database shown in fig. 8 may be provided at an edge node and may be used to store data collected or generated by at least one edge node.
As shown in fig. 8, the database 800 may include a log storage unit 810. The log storage unit 810 may write the received data in the form of a log. For example, the received data may be written to disk based on a pre-written log (WAL) technique. The data stored in the log storage unit 810 may be used for backup and for synchronization to the cloud.
Database 800 may also include a cache unit 820. The buffering unit 820 may be used to buffer the received data. In some embodiments, the data received by the edge node may be buffered in the memory, and the buffered data is accumulated to a predetermined data amount and then written to the disk.
Database 800 may also include a data storage unit 830. The data storage unit 830 may include a meta information storage subunit 831, an index information storage subunit 832, and a time sequence data storage subunit 833. Among them, meta information may include Metric (Metric) information and Field (Field) information of received data. The index information may include identification information associated with the received data. For example, the index information may be used to represent Identification (ID) information of a device for collecting or generating the received data. The time series data may include a key value pair of a time stamp associated with the data to be stored and a numerical value of the data.
Fig. 9 shows another schematic block diagram of a database according to an embodiment of the present disclosure.
As shown in fig. 9, the database 900 may include a log storage unit 910, a cache unit 920, a data storage unit 930, bian Yun cooperation unit 940, and a data maintenance unit 950. The log storage unit 910, the buffer unit 920, and the data storage unit 930 may be implemented by the log storage unit 810, the buffer unit 820, and the data storage unit 830 shown in fig. 8, which are not described herein.
Bian Yun the collaboration unit 940 may be configured to synchronize the data stored in the log storage unit 910 to a server in the cloud. A synchronization rate for synchronizing the received data to the cloud may be determined based on the usage information of the edge node.
The data maintenance unit 950 may be configured to clear data before a predetermined time in the written data through an expiration mechanism. The data maintenance unit 950 may also determine a plurality of files smaller than a predetermined file size included in the written data and combine the plurality of files smaller than the predetermined file size in some embodiments.
The present disclosure also provides a computer device comprising: a memory, a processor and a computer program stored on the memory, wherein the processor is configured to execute the computer program to implement the steps of the method as described above. .
The exemplary embodiments of the present disclosure also provide a non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method as described previously.
The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program realizes the steps of the method as described before when being executed by a processor.
An example of such an electronic device and computer-readable storage medium is described below with reference to fig. 10.
Fig. 10 illustrates an example configuration of a computing device 1000 as an electronic device that may be used to implement the modules and functions described herein. Computing device 1000 may be a variety of different types of devices, such as a server of a service provider, a device associated with a user terminal (e.g., a client device), a system-on-chip, and/or any other suitable computing device or computing system. Examples of computing device 1000 include, but are not limited to: a desktop computer, a server computer, a notebook computer or netbook computer, a mobile device (e.g., tablet or phablet device, a cellular or other wireless telephone (e.g., smart phone), notepad computer, mobile station), a wearable device (e.g., glasses, watch), an entertainment device (e.g., an entertainment appliance, a set-top box communicatively coupled to a display device, a gaming machine), a television or other display device, an automotive computer, and so forth. Accordingly, computing device 1000 may range from full resource devices (e.g., personal computers, game consoles) that have significant memory and processor resources, to low-resource devices with limited memory and/or processing resources (e.g., traditional set-top boxes, hand-held game consoles).
The computing device 1000 may include at least one processor 1002, memory 1004, communication interface(s) 1006, display device 1008, other input/output (I/O) devices 1010, and one or more mass storage devices 1012, capable of communicating with each other, such as through a system bus 1014 or other suitable connection.
The processor 1002 may be a single processing unit or multiple processing units, all of which may include a single or multiple computing units or multiple cores. The processor 1002 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The processor 1002 may be configured to, among other capabilities, obtain and execute computer-readable instructions stored in the memory 1004, mass storage device 1012, or other computer-readable medium, such as program code for the operating system 1016, program code for the application programs 1018, program code for other programs 1020, and the like.
Memory 1004 and mass storage device 1012 are examples of computer storage media for storing instructions that are executed by processor 1002 to implement the various functions as previously described. For example, the memory 1004 may generally include both volatile memory and nonvolatile memory (e.g., RAM, ROM, etc.). In addition, mass storage device 1012 may generally include hard drives, solid state drives, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CD, DVD), storage arrays, network attached storage, storage area networks, and the like. Memory 1004 and mass storage device 1012 may both be referred to herein collectively as memory or a computer storage medium, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by processor 1002 as a particular machine configured to implement the operations and functions described in the examples herein.
A number of program modules may be stored on the mass storage device 1012. These programs include an operating system 1016, one or more application programs 1018, other programs 1020, and program data 1022, which can be loaded into the memory 1004 for execution. Examples of such application programs or program modules may include, for example, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: the receiving unit 610, the buffering unit 620, the compressing unit 630 and the time-ordered data storage unit 640 described in connection with fig. 6, the query request generating unit 710, the parsing unit 720 and the query data obtaining unit 730 described in connection with fig. 7, the log storage unit 810, the buffering unit 820, the data storage unit 830 described in connection with fig. 8, the log storage unit 910, the buffering unit 920, the data storage units 930, bian Yun cooperation unit 940 and the data maintenance unit 950 described in connection with fig. 9, the method 200, the method 300, the method 400 described in connection with fig. 2-4 and/or further embodiments described herein.
Although illustrated in fig. 10 as being stored in memory 1004 of computing device 1000, modules 1016, 1018, 1020, and 1022, or portions thereof, may be implemented using any form of computer readable media accessible by computing device 1000. As used herein, "computer-readable medium" includes at least two types of computer-readable media, namely computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information for access by a computing device.
In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism. Computer storage media as defined herein do not include communication media.
Computing device 1000 may also include one or more communication interfaces 1006 for exchanging data with other devices, such as via a network, direct connection, or the like, as previously discussed. Such communication interfaces may be one or more of the following: any type of network interface (e.g., a Network Interface Card (NIC)), a wired or wireless (such as IEEE 802.11 Wireless LAN (WLAN)) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, etc. The communication interface 1006 may facilitate communications within a variety of network and protocol types, including wired networks (e.g., LAN, cable, etc.) and wireless networks (e.g., WLAN, cellular, satellite, etc.), the Internet, and so forth. The communication interface 1006 may also provide communication with external storage devices (not shown) such as in a storage array, network attached storage, storage area network, or the like.
In some examples, a display device 1008, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 1010 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so on.
While the disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and schematic and not restrictive; the present disclosure is not limited to the disclosed embodiments. Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps than those listed and the indefinite article "a" or "an" does not exclude a plurality, and the term "plurality" means two or more. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims (14)

1. A method performed at an edge node, comprising:
Receiving data to be stored, wherein the received data is written to disk in the form of a log;
caching the received data;
Compressing the data with the same time sequence in the cached data;
Writing the compressed data to a disk of the edge node, wherein the written data comprises a key value pair consisting of the data to be stored and a timestamp associated with the data to be stored;
Synchronizing the received data to a cloud, wherein the synchronized data is in the form of a log;
wherein when the data to be queried is associated with the edge node and the time range of the data to be queried does not exceed a preset time range threshold, and obtaining the data to be queried from the edge nodes, and accessing the cloud to obtain the data to be queried when the data to be queried is associated with at least two edge nodes.
2. The method of claim 1, wherein the written data further comprises meta information associated with the received data and index information, wherein the meta information comprises metric information and domain information of the received data, and the index information comprises identification information associated with the received data.
3. The method of any of claims 1-2, further comprising:
and determining a synchronization rate for synchronizing the received data to the cloud based on the usage information of the edge node.
4. The method of any of claims 1-2, further comprising:
the data written to the memory is purged by an expiration mechanism for a predetermined time.
5. The method of any of claims 1-2, further comprising:
Determining a plurality of files included in the written data that are smaller than a predetermined file size;
And merging the files smaller than the preset file size.
6. A method for querying data, comprising:
generating a query request, wherein the query request includes a time range of data to be queried;
parsing the query request to determine edge nodes associated with data to be queried;
obtaining the data to be queried based on the information of the associated edge node and the time range,
Wherein the received data is stored at the edge node by: writing the received data to a disk in the form of a log, caching the received data, compressing the data of the same time sequence in the cached data, writing the compressed data to the disk of the edge node, wherein the written data comprises a key value pair consisting of data to be stored and a timestamp associated with the data to be stored,
Wherein when an edge node associated with data to be queried comprises a single edge node and the time range does not exceed a preset time range threshold, accessing the single edge node to obtain the data to be queried, and when an edge node associated with data to be queried comprises at least two edge nodes, accessing cloud ends corresponding to the at least two edge nodes to obtain the data to be queried, wherein the data synchronized from the edge nodes to the cloud ends is in a log form.
7. The method of claim 6, wherein obtaining the data to query based on the information of the associated edge node and the time horizon comprises:
and in response to determining that the time range exceeds a preset time range threshold, accessing a cloud corresponding to the edge node to obtain the data to be queried.
8. A data storage device at an edge node, comprising:
A receiving unit configured to receive data to be stored, wherein the received data is written to a disk in the form of a log;
A caching unit configured to cache the data to be stored;
a compression unit configured to compress data of the same time series among the buffered data;
A time-series data storage unit configured to write compressed data, wherein the written data includes a key-value pair composed of the data to be stored and a time stamp associated with the data to be stored,
Wherein the received data is synchronized to a cloud, the synchronized data being in the form of a log, and wherein the data to be queried is obtained from the edge nodes when the data to be queried is associated with the edge nodes and the time range of the data to be queried does not exceed a preset time range threshold, and the cloud is accessed to obtain the data to be queried when the data to be queried is associated with at least two edge nodes.
9. An apparatus for querying data, comprising:
A query request generation unit configured to generate a query request, wherein the query request includes a time range of data to be queried;
an parsing unit configured to parse the query request to determine an edge node associated with the data to be queried;
a query data acquisition unit configured to acquire the data to be queried based on the information of the associated edge node and the time range,
Wherein the received data is stored at the edge node by: writing the received data to a disk in the form of a log, caching the received data, compressing the data of the same time sequence in the cached data, writing the compressed data to the disk of the edge node, wherein the written data comprises a key value pair consisting of data to be stored and a timestamp associated with the data to be stored,
Wherein when the data to be queried is associated with the edge node and the time range of the data to be queried does not exceed a preset time range threshold value, obtaining the data to be queried from the edge node, when the data to be queried is associated with at least two edge nodes, accessing a cloud to obtain the data to be queried, wherein the data synchronized to the cloud from the edge nodes is in a log form.
10. A database at an edge node, comprising:
A log storage unit configured to store data to be stored in the form of a log;
a caching unit configured to cache the data to be stored, wherein data of the same time sequence in the cached data is compressed, and the compressed data is written into a disk of the edge node;
a data storage unit comprising:
a meta information storage subunit configured to store meta information associated with data to be stored;
An index information storage subunit configured to store index information associated with data to be stored;
A time-series data storage subunit configured to store a key-value pair made up of the data to be stored and a time stamp associated with the data to be stored;
An edge cloud synchronization unit configured to synchronize the data to be stored to a cloud end, wherein the synchronized data is in a log form,
Wherein when the data to be queried is associated with the edge node and the time range of the data to be queried does not exceed a preset time range threshold, and obtaining the data to be queried from the edge nodes, and accessing the cloud to obtain the data to be queried when the data to be queried is associated with at least two edge nodes.
11. The database of claim 10, further comprising:
and a data maintenance unit configured to clear data before a predetermined time in the written data through an expiration mechanism.
12. A computer device, comprising:
a memory, a processor and a computer program stored on the memory,
Wherein the processor is configured to execute the computer program to implement the steps of the method of any of claims 1-7.
13. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method of any of claims 1-7.
14. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the method of any of claims 1-7.
CN202011563163.2A 2020-12-25 2020-12-25 Data storage method, method for querying data, database, and readable medium Active CN112650755B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011563163.2A CN112650755B (en) 2020-12-25 2020-12-25 Data storage method, method for querying data, database, and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011563163.2A CN112650755B (en) 2020-12-25 2020-12-25 Data storage method, method for querying data, database, and readable medium

Publications (2)

Publication Number Publication Date
CN112650755A CN112650755A (en) 2021-04-13
CN112650755B true CN112650755B (en) 2024-08-13

Family

ID=75363003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011563163.2A Active CN112650755B (en) 2020-12-25 2020-12-25 Data storage method, method for querying data, database, and readable medium

Country Status (1)

Country Link
CN (1) CN112650755B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113507369B (en) * 2021-06-18 2024-07-16 深圳先进技术研究院 Black box data access method based on blockchain and cloud storage
CN113806307B (en) * 2021-08-09 2024-07-23 阿里巴巴(中国)有限公司 Data processing method and device
CN114676130B (en) * 2022-03-02 2024-09-20 阿里巴巴(中国)有限公司 Time sequence data storage method, computing device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177178A (en) * 2019-12-03 2020-05-19 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN111552687A (en) * 2020-03-10 2020-08-18 远景智能国际私人投资有限公司 Storage method, query method, device, device and storage medium for time series data

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003273952A (en) * 2002-03-13 2003-09-26 Yazaki Corp Data transmission device and data transmission method
US20140067758A1 (en) * 2012-08-28 2014-03-06 Nokia Corporation Method and apparatus for providing edge-based interoperability for data and computations
CN104537112B (en) * 2015-01-20 2017-07-14 成都携恩科技有限公司 A kind of method of safe cloud computing
US10681157B2 (en) * 2015-09-11 2020-06-09 International Business Machines Corporation Adaptive event management framework for resource-constrained environments
US20170193041A1 (en) * 2016-01-05 2017-07-06 Sqrrl Data, Inc. Document-partitioned secondary indexes in a sorted, distributed key/value data store
CN108399263B (en) * 2018-03-15 2022-03-01 北京大众益康科技有限公司 Time sequence data storage and query method and storage and processing platform
CN111309720B (en) * 2018-12-11 2024-08-16 北京京东尚科信息技术有限公司 Time sequence data storage and reading method and device, electronic equipment and storage medium
CN112100197B (en) * 2020-07-31 2022-10-28 紫光云(南京)数字技术有限公司 Quasi-real-time log data analysis and statistics method based on Elasticissearch

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177178A (en) * 2019-12-03 2020-05-19 腾讯科技(深圳)有限公司 Data processing method and related equipment
CN111552687A (en) * 2020-03-10 2020-08-18 远景智能国际私人投资有限公司 Storage method, query method, device, device and storage medium for time series data

Also Published As

Publication number Publication date
CN112650755A (en) 2021-04-13

Similar Documents

Publication Publication Date Title
US11392416B2 (en) Automated reconfiguration of real time data stream processing
CN112650755B (en) Data storage method, method for querying data, database, and readable medium
CN113220715B (en) Data processing method, system, computer and readable storage medium
CN109074377B (en) Managed function execution for real-time processing of data streams
EP3318991A1 (en) Monitoring processes running on a platform as a service architecture
WO2019237821A1 (en) Method and apparatus for transmitting scene image of virtual scene, computer device and computer readable storage medium
CN112583898A (en) Business process arranging method and device and readable medium
CN110113407A (en) Small routine state synchronization method, equipment and computer storage medium
CN112788270B (en) Video backtracking method, device, computer equipment and storage medium
CN113010565A (en) Server cluster-based server real-time data processing method and system
EP2819015B1 (en) Method, terminal, and server for synchronizing terminal mirror
CN111479095B (en) Service processing control system, method and device
CN112035081A (en) Screen projection method and device, computer equipment and storage medium
CN111935663B (en) Sensor data stream processing method, device, medium and electronic equipment
CN107045472A (en) Mobile device information acquisition system
AU2023241318B1 (en) Watermark-based techniques for change-data-capture
EP4042307A1 (en) Method, system, electronic device, and storage medium for storing and collecting temperature data
WO2024222790A1 (en) Decoding method and apparatus applicable to spatial image
US10693736B2 (en) Real time simulation monitoring
CN111090818B (en) Resource management method, resource management system, server and computer storage medium
CN112463864A (en) Data processing method and device and data processing system
CN118540371A (en) Cluster resource management method and device, storage medium and electronic equipment
CN115883647B (en) Service log recording method, system, device, terminal, server and medium
CN111935237B (en) Log processing method and system, electronic device and storage medium
CN118034994A (en) Data processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant