[go: up one dir, main page]

CN112199427A - A data processing method and system - Google Patents

A data processing method and system Download PDF

Info

Publication number
CN112199427A
CN112199427A CN202011019732.7A CN202011019732A CN112199427A CN 112199427 A CN112199427 A CN 112199427A CN 202011019732 A CN202011019732 A CN 202011019732A CN 112199427 A CN112199427 A CN 112199427A
Authority
CN
China
Prior art keywords
metadata
node
storage
cluster
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011019732.7A
Other languages
Chinese (zh)
Other versions
CN112199427B (en
Inventor
邓宇
吕文栋
陈晓新
蔡雅琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011019732.7A priority Critical patent/CN112199427B/en
Publication of CN112199427A publication Critical patent/CN112199427A/en
Application granted granted Critical
Publication of CN112199427B publication Critical patent/CN112199427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据处理方法和系统,涉及计算机技术领域。该方法的一具体实施方式包括:所述用户模块,包括元数据集群、计算集群和共享存储,所述计算集群位于所述元数据集群和所述共享存储之间,用于从所述元数据集群中获取元数据,以及将数据落盘在所述共享存储;所述管理模块,用于监控所述用户模块的运维信息、管理所述元数据集群和所述计算集群级别的增删改查操作。该实施方式将计算与存储解耦,建立虚拟节点解决了存储节点扩容或缩容时数据需迁移的问题,以实现MPP数据库扩展性与并发性的突破。

Figure 202011019732

The invention discloses a data processing method and system, and relates to the technical field of computers. A specific implementation of the method includes: the user module includes a metadata cluster, a computing cluster, and a shared storage, the computing cluster is located between the metadata cluster and the shared storage, and is used to retrieve the metadata from the metadata Obtain metadata in the cluster, and place the data on the shared storage; the management module is used to monitor the operation and maintenance information of the user module, and manage the addition, deletion, modification and query of the metadata cluster and the computing cluster level operate. This embodiment decouples computing and storage, and establishes virtual nodes to solve the problem of data migration when storage nodes expand or shrink, so as to achieve breakthroughs in MPP database scalability and concurrency.

Figure 202011019732

Description

Data processing method and system
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and system.
Background
Traditional storage generally uses MPP (massive Parallel Processing) or Oracle RAC architecture database as a bottom layer technology, and both belong to shared-nothing architecture, and calculation and storage under the architecture are tightly coupled. With the recent multiplied increase of data volume, higher requirements are put on the storage and workload capacity of the database, and problems are brought about due to the limitation of the architecture:
1) concurrent capability limitation: data is scattered and distributed to each computing node for storage, each computing node needs to participate in each query execution, and each computing node can only access the data stored locally; when the data volume is huge, the redistribution process takes longer, so that the hardware resource of a single node becomes a factor for restricting the concurrency of the whole cluster.
2) Limitation of cluster size: when the number of cluster nodes is large to a certain degree, the probability of node failure is obviously increased, and frequent node switching operation can cause the database to be in an unavailable state, thereby restricting the increase of the cluster scale to a certain degree.
Based on the two points, how to break through the limitation of the expansibility and the concurrency of the database becomes a main factor for restricting the development of the current database, and the large-scale storage of mass data is influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and system, which can at least solve the problem of tight coupling between computation and storage in the prior art.
To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a data processing system, including a management module and a user module,
the user module comprises a metadata cluster, a computing cluster and a shared storage, wherein the computing cluster is positioned between the metadata cluster and the shared storage and is used for acquiring metadata from the metadata cluster and destaging the data in the shared storage; wherein, the metadata is used for describing the attribute information of the data;
and the management module is used for monitoring the operation and maintenance information of the user module and managing the metadata cluster and the increase, deletion, modification and check operations of the computing cluster level.
Optionally, the metadata cluster includes a scheduling layer, a service layer, and a storage layer;
one side of the scheduling layer is connected with the computing cluster, and the other side of the scheduling layer is connected with the service layer and used for determining a service node in the service layer for processing the metadata service request according to the service type in the metadata service request transmitted by the computing cluster so as to transmit the identifier of the service node to the computing cluster;
the service layer consists of a group of stateless service nodes, one side of the service layer is connected with the computing cluster, the other side of the service layer is connected with the storage layer, and the service layer is used for receiving a metadata service request transmitted by the computing cluster, performing read-write modification operation on a metadata structure by landing on the storage layer, and feeding back an execution result received from the storage layer to the computing cluster;
and the storage layer is connected with the service layer and used for performing read-write modification operation on the metadata structure and transmitting an execution result to the service layer after the execution is finished.
Optionally, the storage layer is also responsible for multi-copy storage of metadata.
Optionally, the computing cluster includes a plurality of sub-computing clusters, each sub-computing cluster is an interface for a user to log in the user module, and includes a management node and a plurality of computing nodes;
the management node acquires metadata from the metadata cluster so as to determine at least one piece of metadata corresponding to the service requirement when the service requirement is received; summarizing the calculation results transmitted by the calculation nodes and forwarding;
and the computing node acquires data corresponding to the at least one metadata from the shared storage, performs logic computation on the acquired data, and transmits a computation result to the management node.
Optionally, the computing cluster is further provided with a cache layer, configured to cache data and metadata frequently accessed by the computing cluster; wherein, the frequent access is that the access frequency is greater than or equal to the preset access frequency.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data processing method including:
the shared storage receives data transmitted by the computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;
determining a physical partition in a hash ring corresponding to the hash value to store the data in a storage node corresponding to the physical partition.
Optionally, before the shared storage receives the data transmitted by the computing cluster, the method further includes:
carrying out hash processing on a node name of a storage node to obtain a first hash value, and determining a first position corresponding to the first hash value in the hash ring;
and according to the clockwise direction, acquiring a next position adjacent to the first position in the hash ring, constructing a physical partition according to the first position and the next position, and establishing a mapping relation between the physical partition and the storage node.
Optionally, one physical partition corresponds to at least one virtual partition;
the method further comprises the following steps:
carrying out hash processing on the node name of a virtual node to obtain a second hash value, and determining a second position corresponding to the second hash value in the hash ring;
according to the clockwise direction, obtaining the next position adjacent to the second position in the hash ring, constructing a virtual partition according to the second position and the next position, and establishing a mapping relation between the virtual node and the virtual partition;
establishing a mapping relation between a physical partition and a virtual partition based on a corresponding relation between a virtual node and a storage node;
the determining the physical partition in the hash ring corresponding to the hash value comprises: and determining a virtual partition corresponding to the hash value in the hash ring, and determining a physical partition based on a mapping relation between the physical partition and the virtual partition.
Optionally, the method further includes: receiving a storage node capacity expansion/reduction instruction, and registering/deleting at least one storage node in a management node of the computing cluster;
and adjusting the corresponding relation between the storage nodes and the virtual nodes according to the current total number of the storage nodes and the total number of the virtual nodes.
Optionally, the method includes:
the scheduling layer in the metadata cluster determines a service node which processes the metadata service request in the service layer according to the service type in the metadata service request transmitted by a management node;
feeding back the identification of the service node to the management node, so that the management node establishes communication connection with the service node according to the identification;
the service node receives the metadata service request transmitted by the management node so as to modify and store the metadata structure in the storage layer;
and after the metadata structure is modified, the storage layer transmits the execution result to the service layer so as to feed back the execution result to the management node through the service layer.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a data processing electronic device.
The electronic device of the embodiment of the invention comprises: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement any of the data processing methods described above.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing any of the data processing methods described above when executed by a processor.
According to the scheme provided by the invention, one embodiment of the invention has the following advantages or beneficial effects: the MPP database calculation and storage separation architecture based on shared storage is divided into a management module and a user module according to functions. Data is stored on a shared storage, metadata is stored on a metadata cluster, and metadata/data acquisition needs to be communicated among the clusters, so that the operation efficiency is improved. Even if the storage nodes need capacity expansion in the follow-up process, the change is only the corresponding relation between the storage nodes and the virtual nodes, and data corresponding to the virtual nodes are not migrated, so that the defect that the data need to be redistributed when the existing cluster is subjected to capacity expansion and capacity reduction is overcome.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic main flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a metadata cluster structure according to an embodiment of the present invention;
FIG. 3 is a flow chart diagram of a data processing method according to an embodiment of the invention;
FIGS. 4(a) to 4(c) are schematic views showing the construction of the hash ring;
FIG. 5 is a flow diagram illustrating an alternative data processing method according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a capacity expansion node according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a metadata cluster operation mechanism according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 9 is a schematic block diagram of a computer system suitable for use with a mobile device or server implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, a main architecture diagram of a data processing system according to an embodiment of the present invention is shown, which is mainly divided into two parts, a user module and a management module, wherein:
1. the management module is also called a management console and is an entrance of the cluster full-life cycle management and used for monitoring the operation and maintenance information of the user module, managing the metadata cluster and the increase, deletion, modification and check operations of the computing cluster level. The main functions comprise calling an IAAS layer interface to create a cluster, automatically deploying, starting and stopping the cluster, expanding capacity, upgrading and other daily operation operations, and displaying the operation and maintenance monitoring information of a data state, a cluster health state and a fault recovery state according to the responsibility.
2. The user module is divided into three parts, metadata clustering, computing clustering and shared storage.
1) The metadata cluster only stores metadata and can be divided into a scheduling layer, a service layer and a storage layer according to a logical architecture, which is specifically shown in fig. 2.
The scheduling layer is a first layer and is used for managing metadata (attribute information of data) and mainly performing global coordination scheduling, including access control, query optimization and the like. Specifically, according to the service type of the metadata service request transmitted by the management node, a service node in the service layer for processing the metadata service request is determined, so as to transmit the identification of the service node to the management node.
And the service layer is positioned on the second layer, consists of a group of stateless service nodes and is used for receiving a metadata service request transmitted by the computing cluster, performing read-write modification operation on a metadata structure by landing on the storage layer and feeding back an execution result received from the storage layer to the computing cluster.
And the storage layer is positioned on the third layer and used for performing read-write modification operation on the metadata structure and transmitting an execution result to the service layer after the execution is finished. In addition, the storage layer is also responsible for multi-copy storage of metadata, load balance is integrally achieved, and high availability of the cluster is guaranteed.
2) A compute cluster, located between the metadata cluster and the shared storage, comprising a plurality of child compute clusters. From the perception of a user plane, each sub-computing cluster is an independent MPP database and comprises a management node master and a plurality of computing nodes:
the management node is an inlet of the database and is responsible for scheduling the work of all systems of the whole database, a request sent by a user is analyzed and optimized by the management node, and the request is distributed to the computing nodes according to a task distribution scheme of an optimal query plan. And if the weights of all middle school students need to be acquired, determining at least one piece of metadata corresponding to the business requirement.
And secondly, the computing node executes corresponding operation work according to the query plan, if the data corresponding to at least one metadata is acquired from the shared storage, the acquired data is subjected to logic calculation, the calculation result is fed back to the management node, and the calculation result is summarized and forwarded to the client by the management node.
As shown in fig. 1, in the present solution, a cache layer is further disposed on the computing cluster, and before the shared storage, a storage space of the local computing cluster is used for caching data and metadata that are frequently accessed by the local computing cluster (the access frequency is greater than or equal to a preset access frequency). The database is provided with an access frequency statistical system table which can record and identify hot data, and the hot data is information stored by a cache layer.
When accessing the data or metadata, i.e., in the case of a cache hit, the compute node only needs to access the data from the local disk, and does not need to remotely access a distributed storage system located on the shared storage or metadata cluster, thereby ensuring that I/O throughput is not degraded. The cache mechanism provides powerful guarantee for the access efficiency of the common hot spot data and the metadata.
3) The shared storage is used for storing user data in a unified manner, and the implementation mechanism in the shared storage is described with reference to fig. 3 to 5.
In the scheme, the computing resources of the database use physical resources in the computing cluster, and user data are stored in unified shared storage, so that decoupling of computing and storage is realized, and strict physical isolation is performed.
The method provided by the embodiment is based on the MPP database calculation and storage separation architecture of the shared storage, and is divided into a management module and a user module according to functions. Data is stored on a shared storage, metadata is stored on a metadata cluster, and metadata/data acquisition needs to be communicated among the clusters, so that the operation efficiency is improved.
Referring to fig. 3, a flow chart of a data processing method according to an embodiment of the present invention is shown, which includes the following steps:
s301: the shared storage receives data transmitted by a computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;
s302: determining a physical partition in a hash ring corresponding to the hash value to store the data in a storage node corresponding to the physical partition.
In the above embodiment, in steps S301 and S302, the key value for determining data distribution in the MPPDB is called a distribution key, and generally a certain column in the structured data table is selected as the distribution key of the table, and the data distribution strategy adopted by GP4 and GP5 (which are GTP-formatted files) is modulo, that is, a hash value is calculated from the value of the distribution key corresponding to each record, and then the cluster size (number of partitions) is modulo to obtain a remainder, and the record is stored in the partition corresponding to the remainder. This strategy allows the data to be substantially evenly distributed across the partitions without skewing the data itself.
In the Hash modulo mode, when the cluster is subjected to capacity expansion or capacity reduction, the one-way uniqueness of Hash operation shows that the original mapping relation is almost completely invalid, and the data on the storage nodes need to be subjected to Hash distribution again. Data moves between nodes through a network, and under the condition, a large amount of network and I/O expenses exist, on one hand, the running query in the expanding and shrinking process is seriously influenced, and on the other hand, the time window for expanding and shrinking the capacity is longer.
In order to solve the problems, a consistency hash algorithm is introduced, and the consistency hash still utilizes a standard hash function to calculate a hash value. Different from the mode of taking a modulus, the consistency hash is used for solving the defects of the existing mode of taking a modulus by dividing the calculated hash value into different physical partitions and mapping the different physical partitions to different storage nodes.
The hash value returned by the standard hash function is mapped to the integer space of [0,2^32-1] (this value is merely an example, and the hash value is assumed to be a 32-bit unsigned shaping), so that the hash value is connected end to obtain the hash ring as shown in FIG. 4 (a). For data to be stored, it is mapped to the ring address space by the hash algorithm, for example (object is data to be stored):
hash(object1)=key1
hash(object2)=key2
hash(object3)=key3
hash(object4)=key4
the calculated key value is stored in the corresponding position of the ring, and the mapping of the node hash shown in fig. 4(b) can be obtained.
The storage node of the consistent hash can map the node name node to the annular address space through the hash algorithm:
hash(node1)=pos1
hash(node2)=pos2
each node corresponds to a node storing data, and according to the node position (i.e., the first position):
node1 stores data from pos1- > pos2 location (key2, key3)
node2 stores data for pos2- > pos1 location (key4, key1)
It can be seen that each storage node only stores data between its location and the next adjacent location in the hash ring, and the two locations form a physical partition, and the final structure is shown in fig. 4 (c).
Therefore, when the shared memory receives a new data, firstly, based on the distribution key of the data in the table, the hash processing is performed, the physical partition corresponding to the hash value in the hash ring is determined, and finally, the data storage value and the storage node corresponding to the physical partition are used for completing the data distribution.
In the method provided by the embodiment, the calculated hash value is divided into different physical partitions, and the different physical partitions are mapped to different storage nodes, so that the whole is displayed in the form of a hash ring, and the defect that data needs to be redistributed when the existing cluster is subjected to capacity expansion and capacity reduction is overcome.
Referring to fig. 5, a schematic flow chart of an alternative data processing method according to an embodiment of the present invention is shown, including the following steps:
s501: carrying out hash processing on the node name of a virtual node to obtain a second hash value, and determining a second position corresponding to the second hash value in a hash ring;
s502: according to the clockwise direction, obtaining the next position adjacent to the second position in the hash ring, constructing a virtual partition according to the second position and the next position, and establishing a mapping relation between the virtual node and the virtual partition;
s503: carrying out hash processing on a node name of a storage node to obtain a first hash value, and determining a first position corresponding to the first hash value in the hash ring;
s504: according to the clockwise direction, acquiring a next position adjacent to the first position in the hash ring, constructing a physical partition according to the first position and the next position, and establishing a mapping relation between the physical partition and the storage node;
s505: establishing a mapping relation between a physical partition and a virtual partition based on a corresponding relation between a virtual node and a storage node; wherein one physical partition corresponds to at least one virtual partition;
s506: the shared storage receives data transmitted by a computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;
s507: determining a virtual partition corresponding to the hash value in the hash ring, and determining a physical partition based on a mapping relation between the physical partition and the virtual partition so as to store the data into a storage node corresponding to the physical partition.
In the above embodiment, step S506 may refer to the description of step S301 shown in fig. 3, and is not described herein again.
In the above embodiment, when the storage nodes are added or decreased in steps S501 to S505 and S507, data movement on a part of the storage nodes is inevitably brought about, which causes imbalance of resources in the system and affects cluster operating efficiency. Virtual nodes are therefore introduced to solve this problem.
Presetting a fixed number of virtual nodes (actually far larger than the number of actual physical partitions), processing the node name of each virtual node through a hash algorithm to obtain a second hash value, and determining a position pos corresponding to the second hash value in the hash ring. And then, in the same way as the physical partition determining mode, according to the clockwise direction, constructing a virtual partition by the pos and the pos at the next position, and establishing a mapping relation between the virtual node and the virtual partition.
Through the steps, the records can be uniformly distributed on each virtual partition, and then the virtual partitions are mapped to the physical partitions through the corresponding relation between the virtual nodes and the storage nodes. Since the number of virtual partitions is much larger than the number of physical partitions, one physical partition usually corresponds to multiple virtual partitions (e.g., 1: 4), and data corresponding to multiple virtual partitions is stored on their corresponding physical partitions, which is the consistent HASH data distribution policy of the GP 6.
According to the scheme, the data corresponding to each virtual partition are independently stored as files, the data are landed and persisted in the shared storage, and the mapping relation from the data to the virtual partition is reserved. Under the strategy, the capacity expansion and reduction process of the physical partition only dynamically adjusts the mapping relation from the virtual partition to the physical partition, the bottom layer data file stored on the shared storage does not need to be changed, and the data reading and writing and the network transmission are not involved. The size of a time window required by capacity expansion is fixed (a new virtual machine is started and registered in a main node), and the capacity expansion requires deleting nodes without any relation with the existing data volume in the cluster, so that the second-level capacity expansion/capacity reduction is really realized.
As shown in fig. 6, the file of each virtual node is independently stored in the shared storage, and the correspondence between the storage node1, the storage node2, and the virtual node is stored. When the management module issues a capacity expansion instruction, the storage node 3 is registered in the master node, part of the corresponding relation between the storage node1 and the storage node2 is invalid, and is adjusted to be the corresponding relation between the storage node 3 and part of the virtual nodes, and the process does not involve data movement.
Generally, how many virtual partitions a physical partition should correspond to is determined according to a ratio of the total number of the storage nodes to the total number of the virtual nodes, and a corresponding value changes along with capacity expansion/capacity reduction of the storage nodes.
In the method provided by the embodiment, a fixed number of virtual nodes are set, and the corresponding relationship between the virtual nodes and the data is set, so that even if node expansion/contraction occurs, the changed corresponding relationship is only the corresponding relationship between the storage nodes and the virtual nodes, data migration is not involved, network and IO consumption is avoided, and low-cost second-level expansion is really realized.
Referring to fig. 7, which shows a schematic diagram of a metadata cluster operation mechanism according to an embodiment of the present invention, a master1 is a master node of a sub-computation cluster, and includes the following steps:
1. the master1 of a compute cluster sends a metadata service request to the scheduling layer of the metadata cluster. Because each catalog node has different work division and different types of the correspondingly processed requests, after receiving the service request, the scheduling layer determines the catalog node capable of processing the request in the service layer according to the service type in the request.
2. The calling layer feeds back the information with the id of the specified service node to the master 1;
3. the master1 establishes connection with the service node corresponding to the service layer according to the service node id fed back by the scheduling layer, and then sends a metadata read-write request;
4. and the service node receives the metadata service request transmitted by the master1, performs table locking operation according to the content in the request, and performs read-write modification operation of metadata by dropping the storage layer.
Wherein, locking tables occur in insert, update, delete, the database uses an exclusive blocking mechanism, and when executing the above statements, the tables are locked until commit (for saving the modifications made by the transaction to the database), rollback or exit the database user occurs. For example, if the A program performs insert on tableA and the B program also performs insert on tableA when commit is not yet, an exception that the resource is busy occurs, which is the lock table. And lock tables often occur concurrently rather than in parallel, where one thread is not able to operate the database while another is operating the database.
5. And after the metadata reading and writing modification is finished, the storage layer feeds back the execution result to the service layer.
6. The service layer feeds back the execution result to the master1 until the metadata service request is processed.
The details of the implementation of the various components described in the embodiments of the present invention have been described in detail in the above-described methods, and therefore, the details are not repeated here.
FIG. 8 illustrates an exemplary system architecture 800 to which embodiments of the invention may be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805 (by way of example only). The network 804 is used to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 801, 802, 803.
The terminal devices 801, 802, 803 may be various electronic devices having display screens and supporting web browsing, and the server 805 may be a server providing various services.
It is to be noted that the method provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the apparatus is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a user module, a management module. Where the names of these modules do not in some cases constitute a limitation on the modules themselves, for example, a user module may also be described as a "database entry module".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
the shared storage receives data transmitted by a computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;
determining a physical partition in a hash ring corresponding to the hash value to store the data in a storage node corresponding to the physical partition.
Compared with the prior art, the technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
1. the method breaks through the limitation of single cluster scale under the background of massive data, realizes the transverse linear increase of the number of nodes, avoids massive data copying operation among multiple clusters caused by the fact that the clusters are split, and saves ETL scheduling resources.
2. Under the condition of sharing the same set of metadata, strict resource isolation is realized among the computing clusters, the CPU and the memory usage of users can be reasonably planned, and the phenomenon of contention of computing resources of different users is avoided.
3. The traditional database concurrency is limited by hardware resources of nodes, and the invention realizes the linear superposition of the concurrency through a multi-computing cluster mode.
4. Three copies of data are persistently stored based on shared storage, the influence of database cluster faults on data storage is avoided, and favorable guarantee is provided for high availability of the data.
5. The files of the virtual nodes are independently stored, and the corresponding relation between the physical nodes and the virtual nodes is stored, so that the second-level expansion and contraction can be really realized only by adjusting the corresponding relation without data redistribution in the process of expansion and contraction.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1.一种数据处理系统,包括管理模块和用户模块,其特征在于:1. a data processing system, comprising a management module and a user module, is characterized in that: 所述用户模块,包括元数据集群、计算集群和共享存储,所述计算集群位于所述元数据集群和所述共享存储之间,用于从所述元数据集群中获取元数据,以及将数据落盘在所述共享存储;其中,元数据用于描述数据的属性信息;The user module includes a metadata cluster, a computing cluster, and a shared storage, the computing cluster is located between the metadata cluster and the shared storage, and is used for obtaining metadata from the metadata cluster and storing the data. The disk is placed in the shared storage; wherein, the metadata is used to describe the attribute information of the data; 所述管理模块,用于监控所述用户模块的运维信息、管理所述元数据集群和所述计算集群级别的增删改查操作。The management module is used to monitor the operation and maintenance information of the user module, and manage the addition, deletion, modification, and query operations at the metadata cluster and the computing cluster level. 2.根据权利要求1所述的系统,其特征在于,所述元数据集群包括调度层、服务层以及存储层;2. The system according to claim 1, wherein the metadata cluster comprises a scheduling layer, a service layer and a storage layer; 所述调度层一侧与所述计算集群连接,另一侧与所述服务层连接,用于根据所述计算集群传输的元数据服务请求中的服务类型,确定所述服务层中处理所述元数据服务请求的服务节点,以将所述服务节点的标识传输至所述计算集群;One side of the scheduling layer is connected to the computing cluster, and the other side is connected to the service layer, and is configured to determine, according to the service type in the metadata service request transmitted by the computing cluster, to process the a service node for metadata service requests to transmit the identity of the service node to the computing cluster; 所述服务层由一组无状态的服务节点组成,一侧与所述计算集群连接,另一侧与所述存储层连接,用于接收所述计算集群传输的元数据服务请求,落盘到所述存储层上进行元数据结构的读写修改操作,以及将接收自所述存储层的执行结果反馈至所述计算集群;The service layer is composed of a group of stateless service nodes, one side is connected with the computing cluster, and the other side is connected with the storage layer, and is used to receive the metadata service request transmitted by the computing cluster and place it on the disk. Performing read-write and modification operations on the metadata structure on the storage layer, and feeding back the execution result received from the storage layer to the computing cluster; 所述存储层与所述服务层连接,用于进行元数据结构的读写修改操作,并在执行完毕后,将执行结果传输至所述服务层。The storage layer is connected to the service layer, and is used to perform read, write and modify operations of the metadata structure, and after the execution is completed, transmit the execution result to the service layer. 3.根据权利要求2所述的系统,其特征在于,所述存储层还负责多副本存储元数据。3. The system of claim 2, wherein the storage layer is further responsible for storing metadata in multiple copies. 4.根据权利要求1所述的系统,其特征在于,所述计算集群包括多个子计算集群,每个子计算集群为用户登陆所述用户模块的接口,包括一个管理节点和多个计算节点;4. The system according to claim 1, wherein the computing cluster comprises a plurality of sub-computing clusters, and each sub-computing cluster is an interface for a user to log in to the user module, comprising a management node and a plurality of computing nodes; 所述管理节点从所述元数据集群中获取元数据,以在接收到业务需求时,确定与所述业务需求对应的至少一个元数据;以及汇总计算节点传输的计算结果并转发;The management node acquires metadata from the metadata cluster, so as to determine at least one metadata corresponding to the service requirement when receiving the service requirement; and summarize and forward the calculation result transmitted by the computing node; 所述计算节点从所述共享存储中获取与所述至少一个元数据对应的数据,对所获取的数据进行逻辑计算,将计算结果传输至所述管理节点。The computing node acquires data corresponding to the at least one metadata from the shared storage, performs logical calculation on the acquired data, and transmits the calculation result to the management node. 5.根据权利要求1所述的系统,其特征在于,所述计算集群还设置有缓存层,用于缓存所述计算集群经常访问的数据和元数据;其中,经常访问为访问频度大于或等于预设访问频率。5 . The system according to claim 1 , wherein the computing cluster is further provided with a cache layer for caching data and metadata frequently accessed by the computing cluster; wherein, frequent access is an access frequency greater than or Equal to the preset access frequency. 6.一种使用如权利要求1所述的数据处理系统的数据处理方法,其特征在于,包括:6. A data processing method using the data processing system according to claim 1, characterized in that, comprising: 所述共享存储接收所述计算集群传输的数据,确定所述数据所处记录,对所述记录的键值进行哈希处理,得到哈希值;The shared storage receives the data transmitted by the computing cluster, determines the record where the data is located, and performs hash processing on the key value of the record to obtain a hash value; 确定哈希环中与所述哈希值对应的物理分区,以将所述数据存储至与所述物理分区对应的存储节点中。A physical partition corresponding to the hash value in the hash ring is determined, so as to store the data in a storage node corresponding to the physical partition. 7.根据权利要求6所述的方法,其特征在于,在所述共享存储接收所述计算集群传输的数据之前,还包括:7. The method according to claim 6, wherein before the shared storage receives the data transmitted by the computing cluster, the method further comprises: 对一存储节点的节点名称进行哈希处理,得到第一哈希值,确定所述哈希环中与所述第一哈希值对应的第一位置;performing hash processing on the node name of a storage node to obtain a first hash value, and determining a first position corresponding to the first hash value in the hash ring; 按照顺时针方向,获取所述哈希环中与所述第一位置相邻的下一位置,由所述第一位置和下一位置构建一物理分区,建立所述一物理分区与所述一存储节点之间的映射关系。In a clockwise direction, the next position adjacent to the first position in the hash ring is obtained, a physical partition is constructed from the first position and the next position, and the one physical partition and the one The mapping relationship between storage nodes. 8.根据权利要求6或7所述的方法,其特征在于,一个物理分区对应于至少一个虚拟分区;8. The method according to claim 6 or 7, wherein one physical partition corresponds to at least one virtual partition; 所述方法还包括:The method also includes: 对一虚拟节点的节点名称进行哈希处理,得到第二哈希值,确定所述哈希环中与所述第二哈希值对应的第二位置;performing hash processing on the node name of a virtual node to obtain a second hash value, and determining a second position in the hash ring corresponding to the second hash value; 按照顺时针方向,获取所述哈希环中与所述第二位置相邻的下一位置,由所述第二位置和下一位置构建一虚拟分区,建立所述一虚拟节点和所述一虚拟分区之间的映射关系;In a clockwise direction, the next position adjacent to the second position in the hash ring is obtained, a virtual partition is constructed from the second position and the next position, and the one virtual node and the one The mapping relationship between virtual partitions; 基于虚拟节点和存储节点之间的对应关系,建立物理分区与虚拟分区之间的映射关系;Based on the corresponding relationship between the virtual node and the storage node, establish the mapping relationship between the physical partition and the virtual partition; 所述确定哈希环中与所述哈希值对应的物理分区,包括:确定所述哈希环中与所述哈希值对应的虚拟分区,基于物理分区与虚拟分区之间的映射关系,确定物理分区。The determining the physical partition corresponding to the hash value in the hash ring includes: determining the virtual partition corresponding to the hash value in the hash ring, based on the mapping relationship between the physical partition and the virtual partition, Determine the physical partition. 9.根据权利要求8所述的方法,其特征在于,还包括:9. The method of claim 8, further comprising: 接收存储节点扩容/缩容指令,在所述计算集群的一管理节点中注册/删除至少一个存储节点;receiving a storage node expansion/reduction instruction, and registering/deleting at least one storage node in a management node of the computing cluster; 根据存储节点的当前总数量和虚拟节点的总数量,调整存储节点和虚拟节点之间的对应关系。Adjust the corresponding relationship between storage nodes and virtual nodes according to the current total number of storage nodes and the total number of virtual nodes. 10.根据权利要求6所述的方法,其特征在于,包括:10. The method of claim 6, comprising: 所述元数据集群中调度层根据一管理节点传输的元数据服务请求中的服务类型,确定服务层中处理所述元数据服务请求的服务节点;The scheduling layer in the metadata cluster determines the service node in the service layer that processes the metadata service request according to the service type in the metadata service request transmitted by a management node; 将所述服务节点的标识反馈至所述一管理节点,以使得所述一管理节点根据所述标识建立与所述服务节点的通信连接;Feeding back the identification of the service node to the management node, so that the management node establishes a communication connection with the service node according to the identification; 服务节点接收所述一个管理节点传输的元数据服务请求,以对所述存储层中的元数据结构进行修改和保存;The service node receives the metadata service request transmitted by the one management node, so as to modify and save the metadata structure in the storage layer; 存储层在元数据结构修改完毕后,将执行结果传输至所述服务层,以通过所述服务层将执行结果反馈至所述一管理节点。After the metadata structure is modified, the storage layer transmits the execution result to the service layer, so as to feed back the execution result to the one management node through the service layer. 11.一种电子设备,其特征在于,包括:11. An electronic device, characterized in that, comprising: 一个或多个处理器;one or more processors; 存储装置,用于存储一个或多个程序,storage means for storing one or more programs, 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求6-10中任一所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 6-10. 12.一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求6-10中任一所述的方法。12. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 6-10 is implemented.
CN202011019732.7A 2020-09-24 2020-09-24 A data processing method and system Active CN112199427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011019732.7A CN112199427B (en) 2020-09-24 2020-09-24 A data processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011019732.7A CN112199427B (en) 2020-09-24 2020-09-24 A data processing method and system

Publications (2)

Publication Number Publication Date
CN112199427A true CN112199427A (en) 2021-01-08
CN112199427B CN112199427B (en) 2024-12-27

Family

ID=74008057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011019732.7A Active CN112199427B (en) 2020-09-24 2020-09-24 A data processing method and system

Country Status (1)

Country Link
CN (1) CN112199427B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112698926A (en) * 2021-03-25 2021-04-23 成都新希望金融信息有限公司 Data processing method, device, equipment, storage medium and system
CN113886037A (en) * 2021-09-14 2022-01-04 北京东方金信科技股份有限公司 A method and system for realizing data distribution in a distributed database cluster
CN113934707A (en) * 2021-10-09 2022-01-14 京东科技信息技术有限公司 Cloud native database, database capacity expansion method, database capacity reduction method and device
CN114003644A (en) * 2021-10-21 2022-02-01 河南星环众志信息科技有限公司 Distributed transaction processing method, device, medium and database system
CN114020836A (en) * 2021-10-28 2022-02-08 国电南京自动化股份有限公司 Distributed industrial SCADA system measurement data processing method based on time sequence library
CN114357049A (en) * 2022-01-07 2022-04-15 苏州浪潮智能科技有限公司 Storage cluster interconnection method and device, computer equipment and storage medium
CN116049137A (en) * 2022-12-26 2023-05-02 海尔优家智能科技(北京)有限公司 Method and device for managing time-series database system, node device, storage medium
CN116095099A (en) * 2023-01-20 2023-05-09 广东省中山市质量计量监督检测所 Machine vision-based mechanical part quality inspection system
CN116204590A (en) * 2023-02-28 2023-06-02 北京人大金仓信息技术股份有限公司 Data processing method, readable storage medium and computer device for database cluster
CN117573614A (en) * 2023-11-16 2024-02-20 天翼云科技有限公司 A system and method for reducing metadata and migrating only a small amount of data when adding or deleting nodes
CN117971506A (en) * 2024-03-29 2024-05-03 天津南大通用数据技术股份有限公司 MPP database query task balancing method, system, equipment and medium
CN117997896A (en) * 2024-04-03 2024-05-07 环球数科集团有限公司 A data transmission, storage and access system based on IPFS protocol
WO2024119504A1 (en) * 2022-12-09 2024-06-13 华为技术有限公司 Data processing method, apparatus, device and system
WO2025002006A1 (en) * 2023-06-26 2025-01-02 华为云计算技术有限公司 Transaction processing method and system
WO2025086688A1 (en) * 2023-10-27 2025-05-01 华为云计算技术有限公司 Method for processing data, system, and computing device cluster

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867035A (en) * 2012-08-28 2013-01-09 浪潮(北京)电子信息产业有限公司 High-availability method and device of distributed document system cluster
CN103795801A (en) * 2014-02-12 2014-05-14 浪潮电子信息产业股份有限公司 Metadata group design method based on real-time application group
CN109491807A (en) * 2018-11-01 2019-03-19 浪潮软件集团有限公司 Data exchange method, device and system
CN109522283A (en) * 2018-10-30 2019-03-26 深圳先进技术研究院 A kind of data de-duplication method and system
CN109739684A (en) * 2018-11-20 2019-05-10 清华大学 The copy restorative procedure and device of distributed key value database based on vector clock
CN110471613A (en) * 2018-05-09 2019-11-19 杭州海康威视系统技术有限公司 The method of storing data, the method, apparatus and system for reading data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867035A (en) * 2012-08-28 2013-01-09 浪潮(北京)电子信息产业有限公司 High-availability method and device of distributed document system cluster
CN103795801A (en) * 2014-02-12 2014-05-14 浪潮电子信息产业股份有限公司 Metadata group design method based on real-time application group
CN110471613A (en) * 2018-05-09 2019-11-19 杭州海康威视系统技术有限公司 The method of storing data, the method, apparatus and system for reading data
CN109522283A (en) * 2018-10-30 2019-03-26 深圳先进技术研究院 A kind of data de-duplication method and system
CN109491807A (en) * 2018-11-01 2019-03-19 浪潮软件集团有限公司 Data exchange method, device and system
CN109739684A (en) * 2018-11-20 2019-05-10 清华大学 The copy restorative procedure and device of distributed key value database based on vector clock

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112698926A (en) * 2021-03-25 2021-04-23 成都新希望金融信息有限公司 Data processing method, device, equipment, storage medium and system
CN113886037A (en) * 2021-09-14 2022-01-04 北京东方金信科技股份有限公司 A method and system for realizing data distribution in a distributed database cluster
CN113934707A (en) * 2021-10-09 2022-01-14 京东科技信息技术有限公司 Cloud native database, database capacity expansion method, database capacity reduction method and device
CN114003644A (en) * 2021-10-21 2022-02-01 河南星环众志信息科技有限公司 Distributed transaction processing method, device, medium and database system
CN114020836A (en) * 2021-10-28 2022-02-08 国电南京自动化股份有限公司 Distributed industrial SCADA system measurement data processing method based on time sequence library
CN114020836B (en) * 2021-10-28 2025-02-07 国电南京自动化股份有限公司 A distributed industrial SCADA system measurement data processing method based on time series library
CN114357049A (en) * 2022-01-07 2022-04-15 苏州浪潮智能科技有限公司 Storage cluster interconnection method and device, computer equipment and storage medium
CN114357049B (en) * 2022-01-07 2024-01-19 苏州浪潮智能科技有限公司 Storage cluster interconnection method, device, computer equipment and storage medium
WO2024119504A1 (en) * 2022-12-09 2024-06-13 华为技术有限公司 Data processing method, apparatus, device and system
CN116049137A (en) * 2022-12-26 2023-05-02 海尔优家智能科技(北京)有限公司 Method and device for managing time-series database system, node device, storage medium
CN116095099A (en) * 2023-01-20 2023-05-09 广东省中山市质量计量监督检测所 Machine vision-based mechanical part quality inspection system
CN116204590A (en) * 2023-02-28 2023-06-02 北京人大金仓信息技术股份有限公司 Data processing method, readable storage medium and computer device for database cluster
WO2025002006A1 (en) * 2023-06-26 2025-01-02 华为云计算技术有限公司 Transaction processing method and system
WO2025086688A1 (en) * 2023-10-27 2025-05-01 华为云计算技术有限公司 Method for processing data, system, and computing device cluster
CN117573614A (en) * 2023-11-16 2024-02-20 天翼云科技有限公司 A system and method for reducing metadata and migrating only a small amount of data when adding or deleting nodes
CN117971506A (en) * 2024-03-29 2024-05-03 天津南大通用数据技术股份有限公司 MPP database query task balancing method, system, equipment and medium
CN117971506B (en) * 2024-03-29 2024-06-18 天津南大通用数据技术股份有限公司 MPP database query task balancing method, system, equipment and medium
CN117997896A (en) * 2024-04-03 2024-05-07 环球数科集团有限公司 A data transmission, storage and access system based on IPFS protocol
CN117997896B (en) * 2024-04-03 2024-06-04 环球数科集团有限公司 A data transmission, storage and access system based on IPFS protocol

Also Published As

Publication number Publication date
CN112199427B (en) 2024-12-27

Similar Documents

Publication Publication Date Title
CN112199427B (en) A data processing method and system
US10671695B2 (en) System and method for storing and processing database requests
US20200242129A1 (en) System and method to improve data synchronization and integration of heterogeneous databases distributed across enterprise and cloud using bi-directional transactional bus of asynchronous change data system
US8108352B1 (en) Data store replication for entity based partition
AU2013271538B2 (en) Data management and indexing across a distributed database
US9489443B1 (en) Scheduling of splits and moves of database partitions
JP7549137B2 (en) Transaction processing method, system, device, equipment, and program
US7076553B2 (en) Method and apparatus for real-time parallel delivery of segments of a large payload file
US10922303B1 (en) Early detection of corrupt data partition exports
CN112948178A (en) Data processing method, device, system, equipment and medium
US11461201B2 (en) Cloud architecture for replicated data services
CN111459913B (en) Capacity expansion method and device of distributed database and electronic equipment
CN114003580A (en) Database construction method and device applied to distributed scheduling system
US11609933B1 (en) Atomic partition scheme updates to store items in partitions of a time series database
CN110826993A (en) Project management processing method, device, storage medium and processor
CN118673086B (en) Data processing method, device, electronic device and computer readable storage medium
CN105511966B (en) A method and system for business segmentation and optimization of database clusters
US11157454B2 (en) Event-based synchronization in a file sharing environment
CN120560825A (en) Performance optimization method, device, computer equipment, readable storage medium and program product of power data system
Sukhija et al. Load balancing and fault tolerance mechanisms for scalable and reliable big data analytics
Chejarla A Novel Stateful Orchestration Pattern for Data Affinity and Transactional Integrity in Sharded Backend Architectures
CN117560370A (en) Fragment storage method and device
CN119537481A (en) Database synchronization method, device, equipment, medium and program product
CN115550458A (en) Log processing method and related device
HK40037752A (en) Transaction processing method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant