CN112199427A

CN112199427A - A data processing method and system

Info

Publication number: CN112199427A
Application number: CN202011019732.7A
Authority: CN
Inventors: 邓宇; 吕文栋; 陈晓新; 蔡雅琼
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2021-01-08
Anticipated expiration: 2040-09-24
Also published as: CN112199427B

Abstract

The invention discloses a data processing method and system, and relates to the technical field of computers. A specific implementation of the method includes: the user module includes a metadata cluster, a computing cluster, and a shared storage, the computing cluster is located between the metadata cluster and the shared storage, and is used to retrieve the metadata from the metadata Obtain metadata in the cluster, and place the data on the shared storage; the management module is used to monitor the operation and maintenance information of the user module, and manage the addition, deletion, modification and query of the metadata cluster and the computing cluster level operate. This embodiment decouples computing and storage, and establishes virtual nodes to solve the problem of data migration when storage nodes expand or shrink, so as to achieve breakthroughs in MPP database scalability and concurrency.

Description

Data processing method and system

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and system.

Background

Traditional storage generally uses MPP (massive Parallel Processing) or Oracle RAC architecture database as a bottom layer technology, and both belong to shared-nothing architecture, and calculation and storage under the architecture are tightly coupled. With the recent multiplied increase of data volume, higher requirements are put on the storage and workload capacity of the database, and problems are brought about due to the limitation of the architecture:

1) concurrent capability limitation: data is scattered and distributed to each computing node for storage, each computing node needs to participate in each query execution, and each computing node can only access the data stored locally; when the data volume is huge, the redistribution process takes longer, so that the hardware resource of a single node becomes a factor for restricting the concurrency of the whole cluster.

2) Limitation of cluster size: when the number of cluster nodes is large to a certain degree, the probability of node failure is obviously increased, and frequent node switching operation can cause the database to be in an unavailable state, thereby restricting the increase of the cluster scale to a certain degree.

Based on the two points, how to break through the limitation of the expansibility and the concurrency of the database becomes a main factor for restricting the development of the current database, and the large-scale storage of mass data is influenced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a data processing method and system, which can at least solve the problem of tight coupling between computation and storage in the prior art.

To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a data processing system, including a management module and a user module,

the user module comprises a metadata cluster, a computing cluster and a shared storage, wherein the computing cluster is positioned between the metadata cluster and the shared storage and is used for acquiring metadata from the metadata cluster and destaging the data in the shared storage; wherein, the metadata is used for describing the attribute information of the data;

and the management module is used for monitoring the operation and maintenance information of the user module and managing the metadata cluster and the increase, deletion, modification and check operations of the computing cluster level.

Optionally, the metadata cluster includes a scheduling layer, a service layer, and a storage layer;

one side of the scheduling layer is connected with the computing cluster, and the other side of the scheduling layer is connected with the service layer and used for determining a service node in the service layer for processing the metadata service request according to the service type in the metadata service request transmitted by the computing cluster so as to transmit the identifier of the service node to the computing cluster;

the service layer consists of a group of stateless service nodes, one side of the service layer is connected with the computing cluster, the other side of the service layer is connected with the storage layer, and the service layer is used for receiving a metadata service request transmitted by the computing cluster, performing read-write modification operation on a metadata structure by landing on the storage layer, and feeding back an execution result received from the storage layer to the computing cluster;

and the storage layer is connected with the service layer and used for performing read-write modification operation on the metadata structure and transmitting an execution result to the service layer after the execution is finished.

Optionally, the storage layer is also responsible for multi-copy storage of metadata.

Optionally, the computing cluster includes a plurality of sub-computing clusters, each sub-computing cluster is an interface for a user to log in the user module, and includes a management node and a plurality of computing nodes;

the management node acquires metadata from the metadata cluster so as to determine at least one piece of metadata corresponding to the service requirement when the service requirement is received; summarizing the calculation results transmitted by the calculation nodes and forwarding;

and the computing node acquires data corresponding to the at least one metadata from the shared storage, performs logic computation on the acquired data, and transmits a computation result to the management node.

Optionally, the computing cluster is further provided with a cache layer, configured to cache data and metadata frequently accessed by the computing cluster; wherein, the frequent access is that the access frequency is greater than or equal to the preset access frequency.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data processing method including:

the shared storage receives data transmitted by the computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;

determining a physical partition in a hash ring corresponding to the hash value to store the data in a storage node corresponding to the physical partition.

Optionally, before the shared storage receives the data transmitted by the computing cluster, the method further includes:

carrying out hash processing on a node name of a storage node to obtain a first hash value, and determining a first position corresponding to the first hash value in the hash ring;

and according to the clockwise direction, acquiring a next position adjacent to the first position in the hash ring, constructing a physical partition according to the first position and the next position, and establishing a mapping relation between the physical partition and the storage node.

Optionally, one physical partition corresponds to at least one virtual partition;

the method further comprises the following steps:

carrying out hash processing on the node name of a virtual node to obtain a second hash value, and determining a second position corresponding to the second hash value in the hash ring;

according to the clockwise direction, obtaining the next position adjacent to the second position in the hash ring, constructing a virtual partition according to the second position and the next position, and establishing a mapping relation between the virtual node and the virtual partition;

establishing a mapping relation between a physical partition and a virtual partition based on a corresponding relation between a virtual node and a storage node;

the determining the physical partition in the hash ring corresponding to the hash value comprises: and determining a virtual partition corresponding to the hash value in the hash ring, and determining a physical partition based on a mapping relation between the physical partition and the virtual partition.

Optionally, the method further includes: receiving a storage node capacity expansion/reduction instruction, and registering/deleting at least one storage node in a management node of the computing cluster;

and adjusting the corresponding relation between the storage nodes and the virtual nodes according to the current total number of the storage nodes and the total number of the virtual nodes.

Optionally, the method includes:

the scheduling layer in the metadata cluster determines a service node which processes the metadata service request in the service layer according to the service type in the metadata service request transmitted by a management node;

feeding back the identification of the service node to the management node, so that the management node establishes communication connection with the service node according to the identification;

the service node receives the metadata service request transmitted by the management node so as to modify and store the metadata structure in the storage layer;

and after the metadata structure is modified, the storage layer transmits the execution result to the service layer so as to feed back the execution result to the management node through the service layer.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a data processing electronic device.

The electronic device of the embodiment of the invention comprises: one or more processors; a storage device, configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement any of the data processing methods described above.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing any of the data processing methods described above when executed by a processor.

According to the scheme provided by the invention, one embodiment of the invention has the following advantages or beneficial effects: the MPP database calculation and storage separation architecture based on shared storage is divided into a management module and a user module according to functions. Data is stored on a shared storage, metadata is stored on a metadata cluster, and metadata/data acquisition needs to be communicated among the clusters, so that the operation efficiency is improved. Even if the storage nodes need capacity expansion in the follow-up process, the change is only the corresponding relation between the storage nodes and the virtual nodes, and data corresponding to the virtual nodes are not migrated, so that the defect that the data need to be redistributed when the existing cluster is subjected to capacity expansion and capacity reduction is overcome.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic main flow chart of a data processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a metadata cluster structure according to an embodiment of the present invention;

FIG. 3 is a flow chart diagram of a data processing method according to an embodiment of the invention;

FIGS. 4(a) to 4(c) are schematic views showing the construction of the hash ring;

FIG. 5 is a flow diagram illustrating an alternative data processing method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a capacity expansion node according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a metadata cluster operation mechanism according to an embodiment of the present invention;

FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

FIG. 9 is a schematic block diagram of a computer system suitable for use with a mobile device or server implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, a main architecture diagram of a data processing system according to an embodiment of the present invention is shown, which is mainly divided into two parts, a user module and a management module, wherein:

1. the management module is also called a management console and is an entrance of the cluster full-life cycle management and used for monitoring the operation and maintenance information of the user module, managing the metadata cluster and the increase, deletion, modification and check operations of the computing cluster level. The main functions comprise calling an IAAS layer interface to create a cluster, automatically deploying, starting and stopping the cluster, expanding capacity, upgrading and other daily operation operations, and displaying the operation and maintenance monitoring information of a data state, a cluster health state and a fault recovery state according to the responsibility.

2. The user module is divided into three parts, metadata clustering, computing clustering and shared storage.

1) The metadata cluster only stores metadata and can be divided into a scheduling layer, a service layer and a storage layer according to a logical architecture, which is specifically shown in fig. 2.

The scheduling layer is a first layer and is used for managing metadata (attribute information of data) and mainly performing global coordination scheduling, including access control, query optimization and the like. Specifically, according to the service type of the metadata service request transmitted by the management node, a service node in the service layer for processing the metadata service request is determined, so as to transmit the identification of the service node to the management node.

And the service layer is positioned on the second layer, consists of a group of stateless service nodes and is used for receiving a metadata service request transmitted by the computing cluster, performing read-write modification operation on a metadata structure by landing on the storage layer and feeding back an execution result received from the storage layer to the computing cluster.

And the storage layer is positioned on the third layer and used for performing read-write modification operation on the metadata structure and transmitting an execution result to the service layer after the execution is finished. In addition, the storage layer is also responsible for multi-copy storage of metadata, load balance is integrally achieved, and high availability of the cluster is guaranteed.

2) A compute cluster, located between the metadata cluster and the shared storage, comprising a plurality of child compute clusters. From the perception of a user plane, each sub-computing cluster is an independent MPP database and comprises a management node master and a plurality of computing nodes:

the management node is an inlet of the database and is responsible for scheduling the work of all systems of the whole database, a request sent by a user is analyzed and optimized by the management node, and the request is distributed to the computing nodes according to a task distribution scheme of an optimal query plan. And if the weights of all middle school students need to be acquired, determining at least one piece of metadata corresponding to the business requirement.

And secondly, the computing node executes corresponding operation work according to the query plan, if the data corresponding to at least one metadata is acquired from the shared storage, the acquired data is subjected to logic calculation, the calculation result is fed back to the management node, and the calculation result is summarized and forwarded to the client by the management node.

As shown in fig. 1, in the present solution, a cache layer is further disposed on the computing cluster, and before the shared storage, a storage space of the local computing cluster is used for caching data and metadata that are frequently accessed by the local computing cluster (the access frequency is greater than or equal to a preset access frequency). The database is provided with an access frequency statistical system table which can record and identify hot data, and the hot data is information stored by a cache layer.

When accessing the data or metadata, i.e., in the case of a cache hit, the compute node only needs to access the data from the local disk, and does not need to remotely access a distributed storage system located on the shared storage or metadata cluster, thereby ensuring that I/O throughput is not degraded. The cache mechanism provides powerful guarantee for the access efficiency of the common hot spot data and the metadata.

3) The shared storage is used for storing user data in a unified manner, and the implementation mechanism in the shared storage is described with reference to fig. 3 to 5.

In the scheme, the computing resources of the database use physical resources in the computing cluster, and user data are stored in unified shared storage, so that decoupling of computing and storage is realized, and strict physical isolation is performed.

The method provided by the embodiment is based on the MPP database calculation and storage separation architecture of the shared storage, and is divided into a management module and a user module according to functions. Data is stored on a shared storage, metadata is stored on a metadata cluster, and metadata/data acquisition needs to be communicated among the clusters, so that the operation efficiency is improved.

Referring to fig. 3, a flow chart of a data processing method according to an embodiment of the present invention is shown, which includes the following steps:

s301: the shared storage receives data transmitted by a computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;

s302: determining a physical partition in a hash ring corresponding to the hash value to store the data in a storage node corresponding to the physical partition.

In the above embodiment, in steps S301 and S302, the key value for determining data distribution in the MPPDB is called a distribution key, and generally a certain column in the structured data table is selected as the distribution key of the table, and the data distribution strategy adopted by GP4 and GP5 (which are GTP-formatted files) is modulo, that is, a hash value is calculated from the value of the distribution key corresponding to each record, and then the cluster size (number of partitions) is modulo to obtain a remainder, and the record is stored in the partition corresponding to the remainder. This strategy allows the data to be substantially evenly distributed across the partitions without skewing the data itself.

In the Hash modulo mode, when the cluster is subjected to capacity expansion or capacity reduction, the one-way uniqueness of Hash operation shows that the original mapping relation is almost completely invalid, and the data on the storage nodes need to be subjected to Hash distribution again. Data moves between nodes through a network, and under the condition, a large amount of network and I/O expenses exist, on one hand, the running query in the expanding and shrinking process is seriously influenced, and on the other hand, the time window for expanding and shrinking the capacity is longer.

In order to solve the problems, a consistency hash algorithm is introduced, and the consistency hash still utilizes a standard hash function to calculate a hash value. Different from the mode of taking a modulus, the consistency hash is used for solving the defects of the existing mode of taking a modulus by dividing the calculated hash value into different physical partitions and mapping the different physical partitions to different storage nodes.

The hash value returned by the standard hash function is mapped to the integer space of [0,2^32-1] (this value is merely an example, and the hash value is assumed to be a 32-bit unsigned shaping), so that the hash value is connected end to obtain the hash ring as shown in FIG. 4 (a). For data to be stored, it is mapped to the ring address space by the hash algorithm, for example (object is data to be stored):

hash(object1)＝key1

hash(object2)＝key2

hash(object3)＝key3

hash(object4)＝key4

the calculated key value is stored in the corresponding position of the ring, and the mapping of the node hash shown in fig. 4(b) can be obtained.

The storage node of the consistent hash can map the node name node to the annular address space through the hash algorithm:

hash(node1)＝pos1

hash(node2)＝pos2

each node corresponds to a node storing data, and according to the node position (i.e., the first position):

node1 stores data from pos1- > pos2 location (key2, key3)

node2 stores data for pos2- > pos1 location (key4, key1)

It can be seen that each storage node only stores data between its location and the next adjacent location in the hash ring, and the two locations form a physical partition, and the final structure is shown in fig. 4 (c).

Therefore, when the shared memory receives a new data, firstly, based on the distribution key of the data in the table, the hash processing is performed, the physical partition corresponding to the hash value in the hash ring is determined, and finally, the data storage value and the storage node corresponding to the physical partition are used for completing the data distribution.

In the method provided by the embodiment, the calculated hash value is divided into different physical partitions, and the different physical partitions are mapped to different storage nodes, so that the whole is displayed in the form of a hash ring, and the defect that data needs to be redistributed when the existing cluster is subjected to capacity expansion and capacity reduction is overcome.

Referring to fig. 5, a schematic flow chart of an alternative data processing method according to an embodiment of the present invention is shown, including the following steps:

s501: carrying out hash processing on the node name of a virtual node to obtain a second hash value, and determining a second position corresponding to the second hash value in a hash ring;

s502: according to the clockwise direction, obtaining the next position adjacent to the second position in the hash ring, constructing a virtual partition according to the second position and the next position, and establishing a mapping relation between the virtual node and the virtual partition;

s503: carrying out hash processing on a node name of a storage node to obtain a first hash value, and determining a first position corresponding to the first hash value in the hash ring;

s504: according to the clockwise direction, acquiring a next position adjacent to the first position in the hash ring, constructing a physical partition according to the first position and the next position, and establishing a mapping relation between the physical partition and the storage node;

s505: establishing a mapping relation between a physical partition and a virtual partition based on a corresponding relation between a virtual node and a storage node; wherein one physical partition corresponds to at least one virtual partition;

s506: the shared storage receives data transmitted by a computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;

s507: determining a virtual partition corresponding to the hash value in the hash ring, and determining a physical partition based on a mapping relation between the physical partition and the virtual partition so as to store the data into a storage node corresponding to the physical partition.

In the above embodiment, step S506 may refer to the description of step S301 shown in fig. 3, and is not described herein again.

In the above embodiment, when the storage nodes are added or decreased in steps S501 to S505 and S507, data movement on a part of the storage nodes is inevitably brought about, which causes imbalance of resources in the system and affects cluster operating efficiency. Virtual nodes are therefore introduced to solve this problem.

Presetting a fixed number of virtual nodes (actually far larger than the number of actual physical partitions), processing the node name of each virtual node through a hash algorithm to obtain a second hash value, and determining a position pos corresponding to the second hash value in the hash ring. And then, in the same way as the physical partition determining mode, according to the clockwise direction, constructing a virtual partition by the pos and the pos at the next position, and establishing a mapping relation between the virtual node and the virtual partition.

Through the steps, the records can be uniformly distributed on each virtual partition, and then the virtual partitions are mapped to the physical partitions through the corresponding relation between the virtual nodes and the storage nodes. Since the number of virtual partitions is much larger than the number of physical partitions, one physical partition usually corresponds to multiple virtual partitions (e.g., 1: 4), and data corresponding to multiple virtual partitions is stored on their corresponding physical partitions, which is the consistent HASH data distribution policy of the GP 6.

According to the scheme, the data corresponding to each virtual partition are independently stored as files, the data are landed and persisted in the shared storage, and the mapping relation from the data to the virtual partition is reserved. Under the strategy, the capacity expansion and reduction process of the physical partition only dynamically adjusts the mapping relation from the virtual partition to the physical partition, the bottom layer data file stored on the shared storage does not need to be changed, and the data reading and writing and the network transmission are not involved. The size of a time window required by capacity expansion is fixed (a new virtual machine is started and registered in a main node), and the capacity expansion requires deleting nodes without any relation with the existing data volume in the cluster, so that the second-level capacity expansion/capacity reduction is really realized.

As shown in fig. 6, the file of each virtual node is independently stored in the shared storage, and the correspondence between the storage node1, the storage node2, and the virtual node is stored. When the management module issues a capacity expansion instruction, the storage node 3 is registered in the master node, part of the corresponding relation between the storage node1 and the storage node2 is invalid, and is adjusted to be the corresponding relation between the storage node 3 and part of the virtual nodes, and the process does not involve data movement.

Generally, how many virtual partitions a physical partition should correspond to is determined according to a ratio of the total number of the storage nodes to the total number of the virtual nodes, and a corresponding value changes along with capacity expansion/capacity reduction of the storage nodes.

In the method provided by the embodiment, a fixed number of virtual nodes are set, and the corresponding relationship between the virtual nodes and the data is set, so that even if node expansion/contraction occurs, the changed corresponding relationship is only the corresponding relationship between the storage nodes and the virtual nodes, data migration is not involved, network and IO consumption is avoided, and low-cost second-level expansion is really realized.

Referring to fig. 7, which shows a schematic diagram of a metadata cluster operation mechanism according to an embodiment of the present invention, a master1 is a master node of a sub-computation cluster, and includes the following steps:

1. the master1 of a compute cluster sends a metadata service request to the scheduling layer of the metadata cluster. Because each catalog node has different work division and different types of the correspondingly processed requests, after receiving the service request, the scheduling layer determines the catalog node capable of processing the request in the service layer according to the service type in the request.

2. The calling layer feeds back the information with the id of the specified service node to the master 1;

3. the master1 establishes connection with the service node corresponding to the service layer according to the service node id fed back by the scheduling layer, and then sends a metadata read-write request;

4. and the service node receives the metadata service request transmitted by the master1, performs table locking operation according to the content in the request, and performs read-write modification operation of metadata by dropping the storage layer.

Wherein, locking tables occur in insert, update, delete, the database uses an exclusive blocking mechanism, and when executing the above statements, the tables are locked until commit (for saving the modifications made by the transaction to the database), rollback or exit the database user occurs. For example, if the A program performs insert on tableA and the B program also performs insert on tableA when commit is not yet, an exception that the resource is busy occurs, which is the lock table. And lock tables often occur concurrently rather than in parallel, where one thread is not able to operate the database while another is operating the database.

5. And after the metadata reading and writing modification is finished, the storage layer feeds back the execution result to the service layer.

6. The service layer feeds back the execution result to the master1 until the metadata service request is processed.

The details of the implementation of the various components described in the embodiments of the present invention have been described in detail in the above-described methods, and therefore, the details are not repeated here.

FIG. 8 illustrates an exemplary system architecture 800 to which embodiments of the invention may be applied.

As shown in fig. 8, the system architecture 800 may include

terminal devices

801, 802, 803, a network 804, and a server 805 (by way of example only). The network 804 is used to provide a medium for communication links between the

terminal devices

801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. Various communication client applications may be installed on the

terminal devices

801, 802, 803.

The

terminal devices

801, 802, 803 may be various electronic devices having display screens and supporting web browsing, and the server 805 may be a server providing various services.

It is to be noted that the method provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the apparatus is generally disposed in the server 805.

It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a user module, a management module. Where the names of these modules do not in some cases constitute a limitation on the modules themselves, for example, a user module may also be described as a "database entry module".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:

the shared storage receives data transmitted by a computing cluster, determines a record where the data is located, and performs hash processing on a key value of the record to obtain a hash value;

Compared with the prior art, the technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

1. the method breaks through the limitation of single cluster scale under the background of massive data, realizes the transverse linear increase of the number of nodes, avoids massive data copying operation among multiple clusters caused by the fact that the clusters are split, and saves ETL scheduling resources.

2. Under the condition of sharing the same set of metadata, strict resource isolation is realized among the computing clusters, the CPU and the memory usage of users can be reasonably planned, and the phenomenon of contention of computing resources of different users is avoided.

3. The traditional database concurrency is limited by hardware resources of nodes, and the invention realizes the linear superposition of the concurrency through a multi-computing cluster mode.

4. Three copies of data are persistently stored based on shared storage, the influence of database cluster faults on data storage is avoided, and favorable guarantee is provided for high availability of the data.

5. The files of the virtual nodes are independently stored, and the corresponding relation between the physical nodes and the virtual nodes is stored, so that the second-level expansion and contraction can be really realized only by adjusting the corresponding relation without data redistribution in the process of expansion and contraction.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. a data processing system, comprising a management module and a user module, is characterized in that:

The user module includes a metadata cluster, a computing cluster, and a shared storage, the computing cluster is located between the metadata cluster and the shared storage, and is used for obtaining metadata from the metadata cluster and storing the data. The disk is placed in the shared storage; wherein, the metadata is used to describe the attribute information of the data;

The management module is used to monitor the operation and maintenance information of the user module, and manage the addition, deletion, modification, and query operations at the metadata cluster and the computing cluster level.

2. The system according to claim 1, wherein the metadata cluster comprises a scheduling layer, a service layer and a storage layer;

One side of the scheduling layer is connected to the computing cluster, and the other side is connected to the service layer, and is configured to determine, according to the service type in the metadata service request transmitted by the computing cluster, to process the a service node for metadata service requests to transmit the identity of the service node to the computing cluster;

The service layer is composed of a group of stateless service nodes, one side is connected with the computing cluster, and the other side is connected with the storage layer, and is used to receive the metadata service request transmitted by the computing cluster and place it on the disk. Performing read-write and modification operations on the metadata structure on the storage layer, and feeding back the execution result received from the storage layer to the computing cluster;

The storage layer is connected to the service layer, and is used to perform read, write and modify operations of the metadata structure, and after the execution is completed, transmit the execution result to the service layer.

3. The system of claim 2, wherein the storage layer is further responsible for storing metadata in multiple copies.

4. The system according to claim 1, wherein the computing cluster comprises a plurality of sub-computing clusters, and each sub-computing cluster is an interface for a user to log in to the user module, comprising a management node and a plurality of computing nodes;

The management node acquires metadata from the metadata cluster, so as to determine at least one metadata corresponding to the service requirement when receiving the service requirement; and summarize and forward the calculation result transmitted by the computing node;

The computing node acquires data corresponding to the at least one metadata from the shared storage, performs logical calculation on the acquired data, and transmits the calculation result to the management node.

5 . The system according to claim 1 , wherein the computing cluster is further provided with a cache layer for caching data and metadata frequently accessed by the computing cluster; wherein, frequent access is an access frequency greater than or Equal to the preset access frequency.

6. A data processing method using the data processing system according to claim 1, characterized in that, comprising:

The shared storage receives the data transmitted by the computing cluster, determines the record where the data is located, and performs hash processing on the key value of the record to obtain a hash value;

A physical partition corresponding to the hash value in the hash ring is determined, so as to store the data in a storage node corresponding to the physical partition.

7. The method according to claim 6, wherein before the shared storage receives the data transmitted by the computing cluster, the method further comprises:

performing hash processing on the node name of a storage node to obtain a first hash value, and determining a first position corresponding to the first hash value in the hash ring;

In a clockwise direction, the next position adjacent to the first position in the hash ring is obtained, a physical partition is constructed from the first position and the next position, and the one physical partition and the one The mapping relationship between storage nodes.

8. The method according to claim 6 or 7, wherein one physical partition corresponds to at least one virtual partition;

The method also includes:

performing hash processing on the node name of a virtual node to obtain a second hash value, and determining a second position in the hash ring corresponding to the second hash value;

In a clockwise direction, the next position adjacent to the second position in the hash ring is obtained, a virtual partition is constructed from the second position and the next position, and the one virtual node and the one The mapping relationship between virtual partitions;

Based on the corresponding relationship between the virtual node and the storage node, establish the mapping relationship between the physical partition and the virtual partition;

The determining the physical partition corresponding to the hash value in the hash ring includes: determining the virtual partition corresponding to the hash value in the hash ring, based on the mapping relationship between the physical partition and the virtual partition, Determine the physical partition.

9. The method of claim 8, further comprising:

receiving a storage node expansion/reduction instruction, and registering/deleting at least one storage node in a management node of the computing cluster;

Adjust the corresponding relationship between storage nodes and virtual nodes according to the current total number of storage nodes and the total number of virtual nodes.

10. The method of claim 6, comprising:

The scheduling layer in the metadata cluster determines the service node in the service layer that processes the metadata service request according to the service type in the metadata service request transmitted by a management node;

Feeding back the identification of the service node to the management node, so that the management node establishes a communication connection with the service node according to the identification;

The service node receives the metadata service request transmitted by the one management node, so as to modify and save the metadata structure in the storage layer;

After the metadata structure is modified, the storage layer transmits the execution result to the service layer, so as to feed back the execution result to the one management node through the service layer.

11. An electronic device, characterized in that, comprising:

one or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 6-10.

12. A computer-readable medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method according to any one of claims 6-10 is implemented.