WO2024103714A1 - 一种数据处理方法、系统、装置及相关设备 - Google Patents
一种数据处理方法、系统、装置及相关设备 Download PDFInfo
- Publication number
- WO2024103714A1 WO2024103714A1 PCT/CN2023/100673 CN2023100673W WO2024103714A1 WO 2024103714 A1 WO2024103714 A1 WO 2024103714A1 CN 2023100673 W CN2023100673 W CN 2023100673W WO 2024103714 A1 WO2024103714 A1 WO 2024103714A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- metadata
- model
- management device
- permission
- mapping relationship
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 238000013507 mapping Methods 0.000 claims abstract description 165
- 238000012545 processing Methods 0.000 claims abstract description 61
- 230000004044 response Effects 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims description 49
- 230000003993 interaction Effects 0.000 claims description 45
- 230000015654 memory Effects 0.000 claims description 39
- 238000007726 management method Methods 0.000 description 137
- 230000006870 function Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 20
- 230000008569 process Effects 0.000 description 17
- 238000004891 communication Methods 0.000 description 16
- 238000004590 computer program Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 8
- 230000006978 adaptation Effects 0.000 description 7
- 238000003032 molecular docking Methods 0.000 description 7
- 238000005192 partition Methods 0.000 description 6
- 206010047289 Ventricular extrasystoles Diseases 0.000 description 5
- 238000005129 volume perturbation calorimetry Methods 0.000 description 5
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 4
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 230000002085 persistent effect Effects 0.000 description 4
- 238000000926 separation method Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 244000118350 Andrographis paniculata Species 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000004549 pulsed laser deposition Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
Definitions
- the present application relates to the field of big data technology, and in particular to a data processing method, system, device and related equipment.
- Storage-computing separation architecture refers to a layered architecture that separates storage capacity from computing capacity and interconnects them through a network. It has become one of the mainstream technology trends in recent years. Among them, the storage-computing separation architecture includes a storage layer and a computing layer.
- the storage layer includes at least one storage hardware for persistent storage of data; in actual applications, the amount of data stored in the storage layer is large, forming a data lake.
- the computing layer includes at least one computing engine for reading and writing data on the storage layer and performing corresponding calculations.
- the metadata corresponding to the data in the storage layer is deployed in the computing layer. This means that when the calculation includes multiple computing engines, the metadata needs to be copied into multiple copies and configured in each computing engine so that multiple computing engines can share the data in the same storage layer. However, the copying and migration of metadata between different computing engines will form redundant data and easily cause data inconsistency problems.
- a management layer (or data analysis layer) can be added to the storage-computing separation architecture.
- the management layer is connected to the computing layer and the storage layer through the network to achieve unified management of the metadata corresponding to the data in the storage layer.
- Each computing engine in the computing layer obtains metadata through the metadata model and permission model in the management layer, so as to use the metadata to implement operations such as reading and writing data in the storage layer.
- the metadata model refers to the metadata structure adopted by the management layer
- the permission model refers to the permission definition corresponding to the metadata structure.
- the management layer connects to the computing engine of the computing layer, it is required that the computing engine be able to adapt to the metadata model and permission model fixed in the management layer. This makes it difficult for some computing engines to access the storage layer because they cannot adapt to the metadata model and permission model built into the management layer, thus limiting the scalability of the management layer connecting to the computing engine.
- the embodiment of the present application provides a data processing method to improve the scalability of the data processing system docking with the computing engine.
- the present application also provides a corresponding data processing system, a management device, a computing device cluster, a computer-readable storage medium, and a computer program product.
- an embodiment of the present application provides a data processing method, which is applied to a data processing system, wherein the data processing system includes a computing engine, a management device, and a storage device, and the management device is equipped with a first metadata model and a first permission model, and the first metadata model and the first permission model are adapted to the storage device.
- the management device receives an access request for metadata of the target data sent by the computing engine, and responds to the access request, determines the metadata of the target data according to a first mapping relationship, the first mapping relationship is a mapping relationship between the first metadata model and the second metadata model, the second metadata model is adapted to the computing engine, and the determined metadata of the target data satisfies the second metadata model; and the management device authenticates the access request according to the second mapping relationship, the second mapping relationship is a mapping relationship between the first permission model and the second permission model, and the second permission model is adapted to the computing engine, so that the management device authenticates the access request according to the second mapping relationship.
- the metadata of the target data is sent to the computing engine.
- the computing engine can adapt to the metadata model and permission model built into the management device based on the mapping between metadata models and the mapping between permission models. This can get rid of the adaptation limitations of the model built into the management device to the computing engine and improve the scalability of the data processing system docking with the computing engine.
- the management device can also receive a metadata update request sent by the computing engine, and respond to the metadata update request, and translate the original metadata carried in the metadata update request into target metadata according to the first mapping relationship, the original metadata satisfies the second metadata model, and the target metadata satisfies the first metadata model, so that the management device authenticates the metadata update request according to the second mapping relationship, and after the metadata update request is authenticated, updates the target metadata to the management device, for example, the target metadata can be persistently stored.
- the computing engine can update the metadata in the management device through the mapping relationship between the metadata model and the permission model, so that the computing engine can subsequently write new data to the storage device.
- the management device may also output a configuration interface, which may be presented to the user through a client provided to the outside by the data processing system, so that the management device responds to a first operation performed by the user on the configuration interface, establishes a first mapping relationship between the first metadata model and the second metadata model, and responds to a second operation performed by the user on the configuration interface, establishes a second mapping relationship between the first permission model and the second permission model.
- a configuration interface which may be presented to the user through a client provided to the outside by the data processing system, so that the management device responds to a first operation performed by the user on the configuration interface, establishes a first mapping relationship between the first metadata model and the second metadata model, and responds to a second operation performed by the user on the configuration interface, establishes a second mapping relationship between the first permission model and the second permission model.
- the management device may also generate an access control policy for the second metadata model and the second permission model in response to a third operation performed by the user on the configuration interface, so as to utilize the access control policy to implement constraints on access operations of the computing engine. In this way, the convenience of user configuration can be improved through interface interaction.
- the data processing system includes multiple computing engines
- the management device includes a metadata model and a permission model adapted by each of the multiple computing engines. In this way, the management device can use the multiple computing engines to reduce the difficulty of docking the computing engines, thereby further improving the scalability of the computing engines.
- the management device when the management device determines the metadata of the target data according to the first mapping relationship, it can be specifically that the first metadata is first read according to the access request, and the first metadata satisfies the first metadata model, and then the management device translates the first metadata into the second metadata that satisfies the second metadata model (that is, the metadata of the aforementioned target data) according to the first mapping relationship; and when authenticating the access request, the management device can specifically translate the first permission information in the access request into the second permission information that satisfies the first permission model according to the second mapping relationship, and the first permission information satisfies the second permission model, and then authenticates the second permission information; in this way, the management device can specifically send the second metadata to the computing engine after the second permission information is authenticated.
- the management device can determine the metadata required for the computing engine based on the first mapping relationship, and authenticate the computing engine based on the second mapping relationship, so as to achieve adaptation between the management device and the computing engine, thereby improving the scalability of the data processing system docking with the computing engine.
- the present application provides a data processing system, which includes a computing engine, a management device, and a storage device.
- the management device has a first metadata model and a first permission model built in.
- the first metadata model and The first permission model is adapted to the storage device;
- the computing engine is used to generate an access request for the metadata of the target data and send the access request to the management device;
- the management device is used to respond to the access request, determine the metadata of the target data according to the first mapping relationship, and authenticate the access request according to the second mapping relationship; after the access request is authenticated, the metadata of the target data is sent to the computing engine, wherein the first mapping relationship is a mapping relationship between the first metadata model and the second metadata model, the second metadata model is adapted to the computing engine, the metadata of the target data satisfies the second metadata model, and the second mapping relationship is a mapping relationship between the first permission model and the second permission model; the computing engine is also used to read the target data stored in the storage device according to the metadata of the target data.
- the computing engine is further used to generate a metadata update request and send the metadata update request to the management device; the management device is further used to respond to the metadata update request, translate original metadata carried in the metadata update request into target metadata according to the first mapping relationship, the original metadata satisfies the second metadata model, the target metadata satisfies the first metadata model, authenticate the metadata update request according to the second mapping relationship, and update the target metadata to the management device after the metadata update request passes the authentication.
- the management device is further configured to output a configuration interface, and establish a first mapping relationship in response to a first operation performed by a user on the configuration interface, and establish a second mapping relationship in response to a second operation performed by a user on the configuration interface.
- the management device is further configured to generate an access control policy for the second metadata model and the second permission model in response to a third operation performed by the user on the configuration interface.
- the data processing system includes multiple computing engines
- the management device includes a metadata model and a permission model adapted for each of the multiple computing engines.
- the metadata of the target data is the second metadata
- the management device is specifically used to read the first metadata according to the access request, the first metadata satisfies the first metadata model, translate the first metadata into the second metadata that satisfies the second metadata model according to the first mapping relationship, translate the first permission information in the access request into the second permission information that satisfies the first permission model according to the second mapping relationship, the first permission information satisfies the second permission model, and authenticate the second permission information; and after the second permission information is authenticated, send the second metadata to the computing engine.
- the present application provides a management device, which is applied to a data processing system.
- the data processing system also includes a computing engine and a storage device.
- the management device is equipped with a first metadata model and a first permission model, and the first metadata model and the first permission model are adapted to the storage device;
- the management device includes: an interaction module, which is used to receive an access request for metadata of target data sent by the computing engine, and the target data is stored in the storage device; a metadata determination module, which is used to respond to the access request and determine the metadata of the target data according to a first mapping relationship, the first mapping relationship is a mapping relationship between the first metadata model and the second metadata model, the second metadata model is adapted to the computing engine, and the metadata of the target data satisfies the second metadata model; an authentication module, which is used to authenticate the access request according to the second mapping relationship, the second mapping relationship is a mapping relationship between the first permission model and the second permission model, and the second permission model is adapted to the computing engine;
- the interaction module
- the interaction module is further used for the management device to receive a metadata update request sent by the computing engine;
- the metadata determination module is further used for responding to the metadata update request, translating the original metadata carried in the metadata update request into target metadata according to the first mapping relationship, the original metadata satisfying the second metadata model, and the target metadata The data satisfies the first metadata model;
- the authentication module is further used to authenticate the metadata update request according to the second mapping relationship;
- the interaction module is further used to update the target metadata to the management device after the metadata update request passes the authentication.
- the interaction module is also used to output the configuration interface;
- the management device also includes: a mapping module, used to respond to a first operation performed by the user on the configuration interface, establish a first mapping relationship, and respond to a second operation performed by the user on the configuration interface, establish a second mapping relationship.
- mapping module is further configured to generate an access control policy for the second metadata model and the second permission model in response to a third operation performed by the user on the configuration interface.
- the data processing system includes multiple computing engines
- the management device includes a metadata model and a permission model adapted for each of the multiple computing engines.
- the metadata of the target data is the second metadata
- the metadata determination module is specifically used to read the first metadata according to the access request, the first metadata satisfies the first metadata model, and translates the first metadata into the second metadata that satisfies the second metadata model according to the first mapping relationship
- the authentication module is specifically used to translate the first permission information in the access request into the second permission information that satisfies the first permission model according to the second mapping relationship, the first permission information satisfies the second permission model, and authenticates the second permission information
- the interaction module is specifically used to send the second metadata to the computing engine after the second permission information is authenticated.
- the management device provided in the third aspect corresponds to the data processing method provided in the first aspect. Therefore, the technical effects of the third aspect and any implementation method of the third aspect can refer to the technical effects of the first aspect or the corresponding implementation method of the first aspect.
- the present application provides a computing device cluster, wherein the computing device includes at least one computing device, and the at least one computing device includes at least one processor and at least one memory; the at least one memory is used to store instructions, and the at least one processor executes the instructions stored in the at least one memory, so that the computing device cluster executes the data processing method in the above-mentioned first aspect or any possible implementation of the first aspect.
- the memory can be integrated into the processor or can be independent of the processor.
- the at least one computing device may also include a bus.
- the processor is connected to the memory via a bus.
- the memory may include a readable memory and a random access memory.
- the present application provides a computer-readable storage medium, wherein instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is executed on at least one computing device, the at least one computing device executes the method described in the first aspect or any one of the implementations of the first aspect.
- the present application provides a computer program product comprising instructions, which, when executed on at least one computing device, enables the at least one computing device to execute the method described in the first aspect or any one of the implementations of the first aspect.
- FIG1 is a schematic diagram of the structure of an exemplary data processing system provided by the present application.
- FIG. 2 is a schematic diagram of the data structure of the metadata model 1 built into the management device 102 provided by the present application;
- FIG3 is a schematic diagram of a permission model 1 built into the management device 102 provided by the present application.
- FIG4 is a schematic diagram of an exemplary access control strategy provided by the present application.
- FIG5 is a schematic diagram of a data structure of a metadata model 2 adapted to the computing engine 1012 provided by the present application;
- FIG6 is a schematic diagram of the data structure of another metadata model 2 adapted to the computing engine 1012 provided by the present application;
- FIG. 7 is a schematic diagram of a permission model 2 adapted to the computing engine 1012 provided by the present application.
- FIG8 is a schematic diagram of another permission model 2 adapted to the computing engine 1012 provided by the present application.
- FIG9 is a schematic diagram of establishing a mapping between metadata model 1 and metadata model 2 provided in the present application.
- FIG10 is a schematic diagram of establishing a mapping between a permission model 1 and a permission model 2 provided by the present application;
- FIG11 is a schematic diagram of establishing a mapping between another permission model 1 and a permission model 2 provided in the present application;
- FIG12 is a schematic diagram of an access control policy defined for metadata model 2 provided by the present application.
- FIG13 is a schematic diagram of another access control policy defined for metadata model 2 provided by the present application.
- FIG14 is a flow chart of an exemplary data processing method provided by the present application.
- FIG15 is a schematic diagram of the structure of a computing device provided by the present application.
- FIG16 is a schematic diagram of the structure of a computing device cluster provided in the present application.
- the data processing system 100 includes a computing device 101, a management device 102, and a storage device 103, and the computing device 101, the management device 102, and the storage device 103 can be connected to each other through a network.
- the computing device 101 may include one or more computing engines, such as a structured query language (SQL) computing engine, an artificial intelligence (AI) computing engine, and a third-party computing engine. It may be a computing engine of an open source community or a computing engine for commercial use launched by a cloud vendor. Taking the SQL computing engine as an example, the SQL computing engine may specifically be a Presto engine, a Hive engine, a Spark engine, a Clickhouse engine, etc., and these types of computing engines all have versions of open source communities and versions for commercial use. For ease of understanding, FIG1 is illustrated by taking the computing device 101 including computing engines 1011 and computing engines 1012 as an example. The computing engines 1011 and 1012 belong to different types of computing engines. In other embodiments, the computing device 101 may include any number and any type of computing engines. The computing device 101 is used to read and write data in the storage device 103 through the computing engines it includes.
- SQL structured query language
- AI artificial intelligence
- third-party computing engine a third-party computing engine. It may be a
- the management device 102 has a built-in fixed metadata model 1 and a permission model 1.
- the metadata model 1 and the permission model 1 are adapted to the storage device 103, so that the management device 102 can use the metadata model 1 to manage the metadata corresponding to the data stored in the storage device 103, and use the permission model 1 to authenticate the processing operations on the metadata.
- the metadata is used to describe the attribute information of the data stored in the storage device 103, such as the directory to which the data belongs, the database to which the data belongs, the storage location of the data, the storage format, the compression algorithm used, the partition to which the data belongs, etc. interest.
- the storage device 103 is used for persistent storage of data, such as storing data uploaded by one or more users.
- the storage device 103 can be stored based on a data block format, a file format, or an object format, or adopt other storage methods such as column storage and message queues, which are not limited in this embodiment.
- the computing engine (such as SQL computing engine 1011 or AI computing engine 1012) in computing device 101 needs to access the data in storage device 103, it usually needs to first obtain the metadata corresponding to the data through management device 102, so as to access the data in storage device 103 based on the metadata. Based on this, when the computing engine is deployed in computing device 101, the computing engine is required to adapt to the metadata model 1 and permission model 1 built into management device 102, otherwise, the computing engine cannot interact with storage device 103 through management device 102. Therefore, in actual application scenarios, some computing engines may need to perform a more complex adaptation process to achieve docking with management device 102, which is difficult and time-consuming to adapt. There are even some computing engines that cannot adapt to the metadata model 1 and permission model 1 built into management device 102, which limits the scalability of docking computing engines with management device 102 in data processing system 100.
- the present application realizes the connection between the computing engine and the management device 102 by mapping the metadata model 2 and the permission model 2 that the computing engine can adapt to the metadata model 1 and the permission model 1 in the management device 102, respectively.
- the metadata model 2 and the permission model 2 that the computing engine can adapt to can be deployed in the management device 102, and a mapping relationship 1 between the metadata model 2 and the metadata model 1 and a mapping relationship 2 between the permission model 2 and the permission model 1 can be established.
- the computing engine in the computing device 102 can indirectly adapt to the metadata model 1 built into the management device 102 through the metadata model 2 that adapts to it, and indirectly adapt to the permission model 1 built into the management device 102 through the permission model 2 that adapts to it, so that the computing engine can access the management device 102.
- the data processing system 100 can be deployed in the cloud to provide users with cloud services for data processing, such as cloud services for data computing and data storage.
- the computing device 101, the management device 102, and the storage device 103 in the data processing system 100 can be implemented by a computing device or a computing device cluster in the cloud, respectively; the computing device 101, the management device 102, and the storage device 103 can also be deployed on the same computing device, or deployed in the same computing device cluster.
- the data processing system 100 can be deployed locally, so as to provide users with local data processing services.
- the computing device 101 and the management device 102 in the above data processing system 100 can be implemented by software or hardware respectively.
- computing device 101 may include code running on a computing instance.
- the computing instance may include at least one of a host, a virtual machine, and a container.
- the above-mentioned computing instance may be one or more.
- computing device 101 may include code running on multiple hosts/virtual machines/containers.
- the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region (region) or in different regions.
- the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs.
- each AZ includes one data center or multiple geographically close data centers.
- a region can include multiple AZs.
- VPC virtual private cloud
- multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
- VPC virtual private cloud
- a VPC is set up in a region.
- a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
- the computing device 101 is taken as an example of a hardware functional unit, and the computing device 101 may include at least one computing device, such as a server, etc. Alternatively, the computing device 101 may also be a device implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
- the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), a data processing unit (DPU), or any combination thereof.
- CPLD complex programmable logical device
- FPGA field-programmable gate array
- GAL generic array logic
- DPU data processing unit
- the multiple computing devices may be distributed in the same region or in different regions.
- the multiple computing devices included in the computing device 101 may be distributed in the same AZ or in different AZs.
- the multiple computing devices included in the computing device 101 may be distributed in the same VPC or in multiple VPCs.
- the multiple computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
- the management device 102 is similar to the computing device 101.
- the management device 102 may be a code running on a computing instance; when implemented by hardware, the management device 102 may include one or more computing devices.
- the storage device 103 is implemented by hardware and may include at least one storage device with data storage capability, such as one or more storage servers, or may include a device with a persistent storage medium, etc.
- the persistent storage device may be, for example, a hard disk (such as an SSD, HDD).
- the data processing system 100 shown in FIG. 1 is only used as an exemplary illustration. In actual application, the data processing system 100 may also have other implementations.
- the management device 102 in the data processing system 100 may include more numbers or types of metadata models and permission models; or, the metadata model 2 and the permission model 2 adapted to the computing engine in the computing device 101 may also be configured outside the management device 102, such as configured in the computing device 101, etc., and this embodiment does not limit this.
- the management device 102 in the data processing system 100 is fixedly configured with a metadata model 1 and a permission model 1, which may be, for example, a metadata model and a permission model adapted to an open source Hive engine.
- the metadata model 1 built into the management device 102 adopts the metadata structure shown in Figure 2.
- the metadata model includes information such as catalog, database, data table, function, column, partition, row, and location information.
- the directory is the top-level data structure resource in the storage device 103, that is, the largest namespace.
- a directory may include N1 databases, where N1 is a natural number.
- the database is the data structure resource at the next level of the directory.
- the lower-level data structure resources of the database include data tables and functions, and a database may include N2 data tables and N3 functions, where N2 and N3 are both natural numbers.
- a data table which usually includes a view and an index, is a lower-level data structure resource of a database.
- the lower-level data structure resource of the data table includes two dimensions, one dimension is a vertical data organization "column”, and the other dimension is a horizontal data organization "partition” or "row”.
- the data table and its lower-level data structure resource "partition” are mapped to "position", wherein "position” is used to indicate the storage location of the underlying data of the data table and partition in the storage device 103.
- Functions including built-in functions and user defined scalar functions (UDF), are lower-level data structure resources of the database and are mapped to a "location". This "location" is also used to indicate the underlying data storage location of the software package (JAR package) of the function's implementation class (such as a Java implementation class, etc.).
- JAR package software package
- implementation class such as a Java implementation class, etc.
- the metadata model 1 may also adopt other metadata structures adapted to the storage device 103, and this embodiment does not limit this.
- the permission model 1 built into the management device 102 may be as shown in FIG. 3 .
- the permissions corresponding to the directory may include “all”, “create database”, “modify (alter)”, “create catalog (creat catalog)”, “list all databases (list all database)”, etc.
- the permissions corresponding to the database may include: “all”, “create table”, “modify”, “delete (drop, delete structure)”, “describe”, “list table”, “list function”, “list database”, “list all database”, etc.
- the permissions corresponding to the data table may include: “All”, “Modify”, “Delete (structure)”, “Describe”, “Update (update)”, “Insert (insert)”, “Delete (delete, delete data)”, “Query (select)”, etc.
- the permissions corresponding to the columns may include: “All”, “Query”, etc.
- the permissions corresponding to the function may include: “all”, “create (creat)”, “execute (execucte)”, “delete (structure)”, etc.
- the permissions corresponding to the location may include: “read”, “write”, etc.
- the management device 102 may also include an access control policy for the metadata model 1 and the permission model 1, which is used to indicate the permission content required for the computing engine to call the application programming interface (API) to access the metadata in the management device 102.
- the access control policy may be as shown in FIG. 4 .
- the required permission is the "list” permission or the "all” permission.
- the required permission is the global "create directory" permission.
- the permission policies indicate the operations allowed, the objects of the operations, and the subjects who request the operations.
- the computing engine 1011 can access the management device 102 based on the metadata model 1 and the permission model 1, so that the computing engine 1011 can be connected to the management device 102 without additional configuration operations.
- the computing engine 1012 when the computing engine 1012 is deployed in the computing device 101, the computing engine 1012 is not compatible with the metadata model 1 and the permission model 1 built into the management device 102.
- the computing engine 1012 may be a Presto engine, etc.
- a metadata model 2 and a permission model 2 compatible with the computing engine 1012 are deployed in the management device 102, and a mapping relationship 1 is established between the metadata model 2 and the metadata model 1, and a mapping relationship 2 is established between the permission model 2 and the permission model 1.
- the deployed metadata model 2 and permission model 2 may be user-defined models, or may be known models adapted to the computing engine 1012 .
- a user may request to deploy a computing engine 1012 in the data processing system 100 , and provide the data processing system 100 with a metadata model 2 and a permission model 2 customized for the computing engine 1012 .
- the user-defined metadata model may adopt the metadata structure shown in FIG5, where the metadata model 2 includes a directory, a schema, a data table, a UDF (custom scalar function), a column, a partition, a row, and a position.
- the user-defined metadata model may adopt the metadata structure shown in FIG6, where the metadata model 2 includes a database, a data table, a view, a row, and a position.
- the user can also define the permission model 2 shown in Figure 7.
- the permissions corresponding to the directory can include “all” or “administrator”.
- the permissions corresponding to the mode include “use”, “create”, “delete (structure)”, etc.
- the permissions corresponding to the UDF include “all”, “create”, “delete (structure)”, “modify”, “query”, etc.
- the permissions corresponding to the data table include “all”, “query”, “insert”, “delete (data)”, “modify”, “update”, etc. (the permissions corresponding to the remaining metadata are not shown).
- the permissions corresponding to the database may include “create”.
- the permissions corresponding to the view may include “query”, “create”, etc.
- the permissions corresponding to the data table may include “query”, “insert”, etc. (the permissions corresponding to the other metadata are not shown).
- the user may define a mapping relationship 1 between the metadata model 2 and the metadata model 1 built into the management device 102 .
- the data processing system 100 may present a client to the outside, and the client may be, for example, an application running on a user-side device, or a web browser provided to the outside by the data processing system 100.
- the interactive module 1021 and the mapping module 1022 may be included, wherein the interactive module 1021 may output a configuration interface to the client, and the client may present the configuration interface to the user.
- the user may perform a first operation on the configuration interface to establish a mapping relationship 1 between the metadata model 2 and the metadata model 1.
- the interactive module 1021 may feed back the first operation performed by the user to the mapping module 1022, and the mapping module 1022 may map a certain metadata level in the metadata model 2 with the same metadata level in the metadata model 1 according to the first operation.
- mapping module 1022 may map the "schema" level in the metadata model 2 with the "database” level in the metadata model 1 based on the first operation performed by the user, or may map the "UDF" level in the metadata model 2 with the "function” level in the metadata model 1.
- mapping module 1022 can map the “database” level in metadata model 2 with the “directory” level and the “database” level in metadata model 1, respectively, based on the first operation performed by the user, or can map the “view” level in metadata model 2 with the “data table” level in metadata model 1. Then, mapping module 1022 also maps the remaining metadata levels in the two metadata models based on the already mapped levels.
- the mapping module 1022 may directly map the attribute that exists in the metadata model 2 but not in the metadata model 1 to an existing attribute in the metadata model 1 (that is, an existing metadata level, the same below) during the process of establishing the mapping relationship 1; and may map the attribute that does not exist in the metadata model 2 but exists in the metadata model 1 to an existing attribute in the metadata model 2.
- mapping module 1022 also establishes a mapping relationship 2 between the permission model 2 and the permission model 1 built into the management device 102 .
- the user can perform a second operation on the mapping relationship between the permission models on the configuration interface presented by the client.
- the interaction module 1021 can feed back the second operation to the mapping module 1022, and the mapping module 1022 establishes a mapping relationship 2 between the operation permissions corresponding to each metadata level in the metadata model 2 and the operation permissions corresponding to each metadata level in the metadata model 1 according to the second operation.
- the mapping module 1022 can establish a mapping relationship between the permission model 2 shown in FIG7 and the permission model 1.
- the operation permissions "all" and “administrator” for the directory in the permission model 2 can be mapped with the operation permissions "all", “create database”, “modify” and the like in the permission model 1, and a one-to-one mapping relationship, a many-to-one mapping relationship, or a one-to-many mapping relationship can be established between the operation permissions for the directory in the permission model 2 and the operation permissions for the directory in the permission model 1.
- the specific setting can be based on the needs of the actual application, and this embodiment does not limit this.
- the mapping relationship between the operation permissions corresponding to the other metadata levels can also be established in a similar manner.
- the mapping module 1022 can establish a mapping relationship between the permission model 2 shown in FIG8 and the permission model 1.
- the operation permissions "query” and “insert” of the data table in the permission model 2 can be mapped with the operation permissions "query”, “insert”, “delete (structure)", “describe” and other operation permissions in the permission model 1.
- all the operation permissions in the permission model 1 can be mapped according to the needs of the actual application, or only part of the operation permissions can be mapped, such as only establishing a one-to-one mapping relationship between the operation permissions "query” and “insert” of the data table in the permission model 2 and the operation permissions "query” and “insert” in the permission model 1.
- the mapping relationship between the operation permissions corresponding to the remaining metadata levels can also be established in a similar manner.
- the user can also define access control policies for calling metadata model 2 and permission model 2 for the computing engine 1012 to be deployed.
- the user can perform a third operation for the access control policy on the configuration interface presented by the client.
- the interaction module 1021 can feed back the third operation to the mapping module 1022, and the mapping module 1022 creates a corresponding access control policy based on the third operation.
- it can be for the API of adding/deleting/modifying/checking each metadata level in the metadata model 2, based on the permission model 2 corresponding to the metadata model 2, define the permission requirements of the API at the metadata level or the previous level of the metadata level to generate the corresponding access control policy.
- the mapping module 1022 can also define the permission requirements of the API for adding/deleting/modifying/checking the permission policy corresponding to the permission model 2 based on the permission model 2 to generate the corresponding access control policy.
- the access control policy generated by the mapping module 1022 based on the third operation may be as shown in FIG. 12 .
- the required permission is the "all" permission or the "administrator" permission.
- the access control policy generated by the mapping module 1022 based on the third operation may be as shown in FIG. 13 .
- the required permission is the global "Create" permission.
- the required permission for the database to which the data table belongs is the "query" permission.
- the access control policy defined by the mapping module 1022 based on the third operation performed by the user may also be an access control policy of other implementation modes, which is not limited in this embodiment.
- the management device 102 can provide the computing engine 1012 with metadata corresponding to the data in the storage device 103, or save the metadata corresponding to the data written by the computing engine 1012 to the storage device 103. The following describes these two processes in detail.
- the computing engine 1012 When the computing engine 1012 needs to read the data stored in the storage device 103 (hereinafter referred to as the target data), the computing engine 1012 can generate an access request for the metadata of the target data to be read, and send the access request to the management device 102. Specifically, it can call the API interface of the metadata model 2 and the permission model 2 in the management device 102, so that the management device 102 can receive the access request through the API interface.
- the management device 102 includes an interaction module 1021, a metadata determination module 1023, and an authentication module 1024, as shown in FIG1 .
- the interaction module 1021 can provide the access request to the metadata determination module 1023 and the authentication module 1024.
- the metadata determination module 1023 responds to the access request, determines the metadata corresponding to the target data according to the metadata model 2 and the metadata model 1, and feeds the metadata back to the interaction module 1021.
- the authentication module 1024 authenticates the access request according to the permission model 2 and the permission model 1, and feeds back the authentication result to the interaction module 1021.
- the interaction module 1021 sends the metadata corresponding to the target data to the computing engine 1012.
- the access request received by the interaction module 1021 carries indication information for the target data, an execution operation, and an identifier of the computing engine 1012.
- the metadata determination module 1023 may specifically determine the metadata to be accessed by the computing engine 1012 according to the indication information carried in the access request, which is referred to as the first metadata below for the sake of distinction. Since the first metadata satisfies the metadata model 1, and the metadata model 1 is not compatible with the computing engine 1012, the metadata determination module 1023 may translate the first metadata into the second metadata that satisfies the metadata model 2 according to the mapping relationship 1 between the metadata model 1 and the metadata model 2. It can be understood that the metadata model 2 is compatible with the computing engine 1012, and therefore, the computing engine 1012 can identify the second metadata based on the data structure adopted by the metadata model 2.
- the management device 102 after generating the second metadata, the management device 102 does not directly send the second metadata to the computing engine 1012, but sends the second metadata to the computing engine only when the access request passes the authentication.
- the authentication module 1024 in the management device 102 determines the first permission information carried in the access request, and the first permission information may include the identifier of the computing engine 1012, the requested operation (query), and the indication information of the target data (that is, indicating the metadata corresponding to the target data), and, in general, the first permission information satisfies the permission model 2. 's permission information.
- the authentication module 1024 can translate the first permission information into the second permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and use the pre-configured permission policy (and access control policy) to authenticate the second permission information to determine whether the computing engine 1012 has the permission to perform the operation on the metadata, generate an authentication result for the access request and feed it back to the interaction module 1021.
- the interaction module 1021 sends the second metadata translated by the metadata determination module 1023 to the computing engine 1012; when the authentication result indicates that the access request is not authenticated, the interaction module 1021 can feed back to the computing engine 1012 response information indicating that the request failed or the authentication failed.
- the metadata determination module 1023 and the authentication module 1024 perform the operations of generating the second metadata and authenticating the access request in parallel as an example for explanation.
- the authentication module 1024 may first authenticate the access request and feed back the authentication result to the metadata determination module 1023; the metadata determination module 1023 performs the process of generating the second metadata only when it determines that the authentication result indicates that the access request has been authenticated.
- the computing engine 1012 can access the target data stored in the storage device 103 according to the second metadata.
- the computing engine 1012 can directly access the storage device 103 according to the second metadata to obtain the target data to be read; or the computing engine 1012 can call the data API interface provided by the management device 102 to the outside according to the second metadata, so as to indirectly access the target data stored in the storage device 103 by using the data API interface, which is not limited in this embodiment.
- the computing engine 1012 may also request access to the permission policy in the management device 102.
- the computing engine 1012 may generate an access request for the target permission policy in the management device 102, the access request including the identifier of the computing engine 1012, the indication information of the target permission policy, and the operation (query) for the target permission policy, and send the access request to the interaction module 1021, which provides the access request to the authentication module 1024.
- the authentication module 1024 may authenticate the access request based on the permission configuration for the permission policy to determine whether the computing engine 1012 has the permission to access the target permission policy built into the management device 102.
- the authentication module 1024 may translate the target permission policy into a permission policy that satisfies the permission model 2, and feed it back to the interaction module 1021, so that the interaction module 1021 sends the permission policy to the computing engine 1012.
- the computing engine 1012 can generate metadata corresponding to the new data according to the storage plan for the new data, which is referred to as original metadata for ease of description, and the original metadata is usually metadata that satisfies the metadata model 2. Then, the computing engine 1012 can generate a metadata update request including the original metadata (which can also include the identifier of the computing engine 1012 and the operation requested to be performed), and send the metadata update request to the interaction module 1021 in the management device 102.
- original metadata which can also include the identifier of the computing engine 1012 and the operation requested to be performed
- the interaction module 1021 may provide the received metadata update request to the authentication module 1024 and the metadata determination module 1023 .
- the authentication module 1024 may first authenticate the metadata update request, specifically by determining the third permission information carried in the metadata update request, which may include the identifier of the computing engine 1012, the requested operation (such as modification, creation, etc.), and the original metadata. Thus, the authentication module 1024 may translate the third permission information into fourth permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the fourth permission information using the pre-configured permission policy (and access control policy) to determine whether the computing engine 1012 has the permission to perform the operation on the original metadata, generate an authentication result for the metadata update request, and It is fed back to the metadata determination module 1023.
- the authentication module 1024 may translate the third permission information into fourth permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the fourth permission information using the pre-configured permission policy (and access control policy) to determine whether the computing engine 1012 has the permission to perform the operation on the original metadata, generate an
- the metadata determination module 1023 determines that the authentication result indicates that the metadata update request has passed the authentication
- the original metadata carried in the metadata update request is translated into target metadata that satisfies the metadata model 1 according to the mapping relationship 1 between the metadata model 2 and the metadata model 1, and the target metadata is updated to the management device 102.
- the target metadata can be persistently stored in the management device 102.
- the metadata determination module 1023 performs the translation process of the original metadata after determining that the metadata update request has passed the authentication as an example.
- the metadata determination module 1023 can perform the metadata translation and authentication request process in parallel with the authentication module 1024, and this embodiment does not limit this.
- the computing engine 1012 may write the new data into the storage device 103 according to the original metadata or the target metadata.
- the computing engine 1012 may directly access the storage device 103 and write the new data into the storage device 103; or the computing engine 1012 may call the data API interface provided by the management device 102 to write the new data into the storage device 103 indirectly through the management device 102.
- the computing engine 1012 deployed in the computing device 102 can utilize the metadata model 2 and permission model 2 adapted to it to achieve adaptation with the metadata model 1 and permission model 1 built into the management device 102, so that the computing engine 1012 can read and write data to the storage device 103 through the management device 102.
- the computing device 101 includes a plurality of computing engines (such as the computing engine 1011 and the computing engine 1012 in FIG. 1 ), and the management device 102 includes metadata models and permission models adapted for each of the plurality of computing engines, such as the metadata models 1 and 2 and the permission models 1 and 2 mentioned above, and by establishing mappings between different metadata models and mappings between different permission models, data sharing between different computing engines can be achieved.
- the computing engines 1011 and 1012 can use the metadata model 2 and the permission model 2 to write new data to the storage device 103
- the computing engine 1011 can use the metadata model 1 and the permission model 1 to write new data to the storage device 103.
- computing engine 1011 can access the data written by computing engine 1012 (or the created data table) based on mapping relationship 1 between metadata models and mapping relationship 2 between permission models; computing engine 1012 can access the data written by computing engine 1011 (or the created data table) based on mapping relationship 1 between metadata models and mapping relationship 2 between permission models.
- FIG 14 is a flow chart of a data processing method in an embodiment of the present application.
- the method can be applied to the data processing system 100 shown in Figure 1 above, or it can also be applied to other applicable application scenarios.
- the following is an example of application to the data processing system 100 shown in Figure 1.
- the operations performed by the computing device 101 are specifically performed by the computing engine 1012 in the computing device 101; the operations performed by the management device 102 are performed by multiple functional modules included in the management device 102.
- the data processing method shown in FIG14 may specifically include:
- the interaction module 1021 provides the access request to the metadata determination module 1023 and the authentication module 1024 respectively.
- the metadata determination module 1023 responds to the access request, determines the metadata of the target data according to the mapping relationship 1 between the metadata model 1 and the metadata model 2, and feeds the metadata of the target data back to the interaction module 1021, wherein:
- the metadata model 2 is adapted to the computing engine 1012 , and the metadata of the determined target data satisfies the metadata model 2 .
- mapping relationship 1 there is a mapping relationship 1 between the metadata model 1 and the metadata model 2, so that after the metadata determination module 1023 determines the metadata corresponding to the target data according to the access request, it can translate the metadata satisfying the data structure of the metadata model 1 into the metadata satisfying the data structure of the metadata model 2 (that is, the metadata of the target data) according to the mapping relationship 1.
- the mapping relationship 1 between the metadata model 1 and the metadata model 2 can be established in advance by the mapping module 1022 in the management device 102.
- the specific implementation process of establishing the mapping relationship 1 can refer to the relevant description of the above-mentioned embodiment, which will not be repeated here.
- the authentication module 1024 authenticates the access request according to the mapping relationship 2 between the permission model 1 and the permission model 2, and feeds back the authentication result to the interaction module 1021, wherein the permission model 2 is adapted to the computing engine 1012.
- the authentication module 1024 can determine the first permission information carried in the access request, and the first permission information can include, for example, the identifier of the computing engine 1012, the requested operation, and the indication information of the target data, and the first permission information is the permission information that satisfies the permission model 2. Therefore, the authentication module 1024 can translate the first permission information into the second permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the second permission information using the pre-configured permission policy to determine whether the computing engine 1012 has the permission to perform the operation on the metadata, generate the authentication result for the access request and feed it back to the interaction module 1021.
- the mapping relationship 2 between the permission model 1 and the permission model 2 can be established in advance by the mapping module 1022, and the specific implementation process of establishing the mapping relationship 2 can refer to the relevant description of the aforementioned embodiment, which will not be repeated here.
- the computing engine 1012 reads the target data stored in the storage device 103 according to the metadata of the target data.
- the computing engine 1012 can directly access the storage device 103 according to the metadata to read the target data in the storage device 103; or, the computing engine 1012 can call the data API interface provided by the management device 102 to the outside and indirectly read the target data in the storage device 103 through the management device 102.
- step S1403 and step S1404 may be executed simultaneously; or, the authentication module 1024 may first authenticate the access request and provide the generated authentication result to the metadata determination module 1023; then, after determining that the access request has passed the authentication according to the authentication result, the metadata determination module 1023 translates the metadata that satisfies the metadata model 1 into the metadata that satisfies the metadata model 2.
- the data processing method provided in this embodiment corresponds to the data processing system 100 shown in FIG. 1 above. Therefore, the specific implementation process of steps S1401 to S1406 can refer to the relevant description of the aforementioned embodiment and will not be repeated here.
- the calculation engine 1012 reads the target data in the storage device 103 as an example for explanation.
- the calculation engine 1012 writes new data to the storage device 103
- the calculation engine 1012 can generate the original metadata corresponding to the new data according to the storage plan for the new data, and the original metadata satisfies the metadata model 2 adapted to the calculation engine 1012.
- the calculation engine 1012 can generate a metadata update request including the original metadata (and can also include the identifier of the calculation engine 1012 and the operation requested to be performed), and send the metadata update request to the interaction module 1021 in the management device 102.
- the interaction module 1021 can provide the received metadata update request to the authentication module 1024 and Metadata determination module 1023.
- the authentication module 1024 may first authenticate the metadata update request, specifically by determining the third permission information carried in the metadata update request, the third permission information may include the identifier of the computing engine 1012, the requested operation (such as modification, creation, etc.) and the original metadata, and the third permission model satisfies the permission model 2. Then, the authentication module 1024 may translate the third permission information into the fourth permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the fourth permission information using the pre-configured permission policy to determine whether the computing engine 1012 has the permission to perform the operation on the original metadata, generate the authentication result for the metadata update request and feed it back to the metadata determination module 1023.
- the metadata determination module 1023 determines that the authentication result indicates that the metadata update request has passed the authentication
- the original metadata carried in the metadata update request is translated into target metadata that satisfies the metadata model 1 according to the mapping relationship 1 between the metadata model 2 and the metadata model 1, and the target metadata is updated to the management device 102, specifically, the target metadata may be persistently stored in the management device 102.
- the computing engine 1012 may write new data into the storage device 103 according to the original metadata or the target metadata.
- the management device 102 (including the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024) involved in the data processing process may be software configured on a computing device or a computing device cluster, and by running the software on the computing device or computing device cluster, the computing device or computing device cluster may implement the functions of the management device 102.
- the management device 102 involved in the data processing process is introduced in detail.
- Figure 15 shows a structural diagram of a computing device, on which the management device 102 can be deployed.
- the computing device can be a computing device in a cloud environment (such as a server), or a computing device in an edge environment, or a terminal device, etc., which can be specifically used to implement the functions of the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024 in the embodiment shown in Figure 1 above.
- the computing device 1500 includes a processor 1510, a memory 1520, a communication interface 1530, and a bus 1540.
- the processor 1510, the memory 1520, and the communication interface 1530 communicate with each other through the bus 1540.
- the bus 1540 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
- the bus may be divided into an address bus, a data bus, a control bus, and the like.
- FIG. 15 is represented by only one thick line, but it does not mean that there is only one bus or one type of bus.
- the communication interface 1530 is used to communicate with the outside, such as obtaining access requests, feeding back metadata of target data, and the like.
- the processor 1510 can be a central processing unit (CPU), an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or one or more integrated circuits.
- the processor 1510 can also be an integrated circuit chip with signal processing capabilities.
- the functions of each module in the management device 102 can be completed by the hardware integrated logic circuit in the processor 1510 or the instructions in the form of software.
- the processor 1510 can also be a general-purpose processor, a digital signal process (DSP), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application.
- DSP digital signal process
- FPGA field programmable gate array
- the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., and the method disclosed in the embodiments of the present application can be directly embodied as a hardware decoding processor for execution, or it can be executed by a combination of hardware and software modules in the decoding processor.
- the software module can The information is located in a random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, or other mature storage media in the art.
- the storage medium is located in the memory 1520, and the processor 1510 reads the information in the memory 1520 and completes part or all of the functions in the management device 102 in combination with its hardware.
- the memory 1520 may include a volatile memory, such as a random access memory (RAM).
- the memory 1520 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a HDD, or a SSD.
- the memory 1520 stores executable codes, and the processor 1510 executes the executable codes to execute the method executed by the aforementioned management device 102 .
- the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024 described in the embodiment shown in Figure 1 are implemented by software
- the software or program code required to execute the functions of the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024 in Figure 1 are stored in the memory 1520
- the interaction between the interaction module 1021 and other devices is realized through the communication interface 1530
- the processor is used to execute instructions in the memory 1520 to implement the method executed by the management device 102.
- FIG16 is a schematic diagram showing the structure of a computing device cluster.
- the computing device cluster 160 shown in FIG16 includes multiple computing devices, and the management device 102 can be distributedly deployed on multiple computing devices in the computing device cluster 160.
- the computing device cluster 160 includes multiple computing devices 1600, each computing device 1600 includes a memory 1620, a processor 1610, a communication interface 1630 and a bus 1640, wherein the memory 1620, the processor 1610, and the communication interface 1630 are connected to each other through the bus 1640.
- the processor 1610 may be a CPU, a GPU, an ASIC, or one or more integrated circuits.
- the processor 1610 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, some functions of the management device 102 may be completed by the hardware integrated logic circuit or software instructions in the processor 1610.
- the processor 1610 may also be a DSP, an FPGA, a general processor, other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and may implement or execute some methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
- the general processor may be a microprocessor or the processor may also be any conventional processor, etc.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as a hardware decoding processor for execution, or may be executed by a combination of hardware and software modules in the decoding processor.
- the software module may be located in a mature storage medium in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc.
- the storage medium is located in the memory 1620.
- the processor 1610 reads the information in the memory 1620, and in combination with its hardware, some functions of the management device 102 may be completed.
- the memory 1620 may include ROM, RAM, static storage device, dynamic storage device, hard disk (such as SSD, HDD), etc.
- the memory 1620 may store program codes, for example, part or all of the program codes for implementing the interaction module 1021, part or all of the program codes for implementing the mapping module 1022, part or all of the program codes for implementing the metadata determination module 1023, part or all of the program codes for implementing the authentication module 1024, etc.
- the processor 1610 executes part of the method executed by the management apparatus 102 based on the communication interface 1630, such as a part of the computing devices 1600 may be used to execute the method executed by the interaction module 1021, a part of the computing devices 1600 may be used to execute the method executed by the mapping module 1022, a part of the computing devices 1600 may be used to execute the method executed by the metadata determination module 1023, and a part of the computing devices 1600 may be used to execute the method executed by the authentication module 1024.
- the memory 1620 can also store data. For example: intermediate data or result data generated by the processor 1610 during the execution process, such as the above-mentioned first metadata, second metadata, first permission information, second permission information, etc.
- the communication interface 1603 in each computing device 1600 is used for external communication, such as interacting with other computing devices 1600 .
- the bus 1640 may be a peripheral component interconnect standard bus or an extended industry standard architecture bus, etc.
- the bus 1640 in each computing device 1600 in FIG16 is represented by only one thick line, but does not mean that there is only one bus or one type of bus.
- the plurality of computing devices 1600 establish communication paths through a communication network to implement the functions of the management apparatus 102.
- Any computing device may be a computing device in a cloud environment (eg, a server), or a computing device in an edge environment, or a terminal device.
- an embodiment of the present application also provides a computer-readable storage medium, which stores instructions.
- the computer-readable storage medium When the computer-readable storage medium is run on one or more computing devices, the one or more computing devices execute the methods executed by the various modules of the management device 102 of the above embodiment.
- the embodiment of the present application further provides a computer program product, and when the computer program product is executed by one or more computing devices, the one or more computing devices execute any of the aforementioned data processing methods.
- the computer program product may be a software installation package, and when any of the aforementioned data processing methods is required, the computer program product may be downloaded and executed on a computer.
- the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
- a computer device which can be a personal computer, a training device, or a network device, etc.
- all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
- all or part of the embodiments may be implemented in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer may be A general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from one website, computer, training equipment, or data center to another website, computer, training equipment, or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
- wired e.g., coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless e.g., infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training equipment, data center, etc. that includes one or more available media integrated.
- the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)).
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
本申请提供了一种数据处理方法,数据处理系统中的管理装置接收计算引擎发送的针对存储装置存储的目标数据的元数据的访问请求,并响应该访问请求,根据与计算引擎适配第二元数据模型与管理装置内置的第一元数据模型之间的第一映射关系确定目标数据的元数据;并且,管理装置根据与计算引擎适配第二权限模型与管理装置内置的第一权限模型之间的第二映射关系对访问请求进行鉴权,从而在访问请求通过鉴权后,将元数据发送给计算引擎。计算引擎能够基于元数据模型之间的映射以及权限模型之间的映射,实现与管理装置内置的元数据模型与权限模型适配,以此可以提高数据处理系统对接计算引擎的扩展性。此外,本申请还提供了对应的系统、装置及相关设备。
Description
本申请要求于2022年11月18日提交中国国家知识产权局、申请号为202211446388.9、申请名称为“一种数据处理方法、系统、装置及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及大数据技术领域,尤其涉及一种数据处理方法、系统、装置相关设备。
存算分离架构,是指将存储能力与计算能力分开,并通过网络实现互联的分层架构,已经成为近些年来主流的技术趋势之一。其中,存算分离架构包括存储层以及计算层。存储层包括至少一个存储硬件,用于持久化存储数据;实际应用时,存储层所存储的数据量较大,形成数据湖(data lake)。计算层包括至少一个计算引擎,用于对存储层进行数据读写以及执行相应的计算。
目前,多数存算分离架构中,存储层中数据对应的元数据部署于计算层,这使得当计算出包括多个计算引擎时,需要将该元数据复制成多份并配置于各个计算引擎,以便多个计算引擎能够共享同一存储层中的数据。但是,元数据在不同计算引擎之间的复制和迁移,会形成冗余数据以及容易产生数据不一致的问题。
为此,存算分离架构中,还可以增设管理层(或者称之为数据分析层),该管理层通过网络与计算层以及存储层连接,用于实现对存储层中的数据对应的元数据进行统一管理。计算层中的各个计算引擎均通过管理层中的元数据模型以及权限模型获取元数据,以便利用该元数据实现对存储层的数据读写等操作。其中,元数据模型,是指管理层所采用的元数据结构;权限模型,是指元数据结构对应的权限定义。
但是,管理层在对接计算层的计算引擎时,要求计算引擎能够与管理层中固定配置的元数据模型以及权限模型适配,这导致部分计算引擎因为无法适配管理层内置的元数据模型以及权限模型而难以实现对存储层的访问,从而限制了管理层对接计算引擎的扩展性。
发明内容
有鉴于此,本申请实施例提供了一种数据处理方法,以提高数据处理系统对接计算引擎的扩展性。本申请还提供了对应的数据处理系统、管理装置、计算设备集群、计算机可读存储介质以及计算机程序产品。
第一方面,本申请实施例提供了一种数据处理方法,该方法应用于数据处理系统,该数据处理系统包括计算引擎、管理装置以及存储装置,并且,管理装置内置有第一元数据模型以及第一权限模型,该第一元数据模型以及第一权限模型与存储装置适配,则,在计算引擎在请求访问存储装置中存储的目标数据的过程中,管理装置接收计算引擎发送的针对该目标数据的元数据的访问请求,并响应该访问请求,根据第一映射关系确定目标数据的元数据,该第一映射关系为第一元数据模型与第二元数据模型之间的映射关系,该第二元数据模型与计算引擎适配,并且,所确定出的目标数据的元数据满足第二元数据模型;并且,管理装置根据第二映射关系对访问请求进行鉴权,该第二映射关系为第一权限模型与第二权限模型之间的映射关系,该第二权限模型与计算引擎适配,从而管理装置在访问
请求通过鉴权后,将目标数据的元数据发送给计算引擎。
如此,由于管理装置中的第一元数据模型与第一权限模型,分别与计算引擎所适配的第二元数据模型以及第二权限模型存在映射,这使得计算引擎能够基于元数据模型之间的映射以及权限模型之间的映射,实现与管理装置内置的元数据模型与权限模型适配,以此可以摆脱管理装置内置的模型对于计算引擎的适配限制,提高数据处理系统对接计算引擎的扩展性。
在一种可能的实施实施方式中,管理装置还可以接收计算引擎发送的元数据更新请求,并响应该元数据更新请求,根据第一映射关系,将元数据更新请求中携带的原始元数据翻译为目标元数据,该原始元数据满足第二元数据模型,该目标元数据满足第一元数据模型,从而管理装置根据第二映射关系对元数据更新请求进行鉴权,并在元数据更新请求通过鉴权后,将该目标元数据更新至管理装置,例如可以是对目标元数据进行持久化存储。如此,计算引擎可以通过元数据模型与权限模型之间的映射关系,实现在管理装置中对元数据进行更新,以便计算引擎后续向存储装置中写入新数据。
在一种可能的实施方式中,管理装置还可以输出配置界面,该配置界面例如可以是通过数据处理系统对外提供的客户端呈现给用户,从而管理装置响应用户在该配置界面执行的第一操作,建立第一元数据模型与第二元数据模型之间的第一映射关系,并响应用户在该配置界面执行的第二操作,建立第一权限模型与第二权限模型之间的第二映射关系。如此,可以由用户可以界面交互的方式,实现建立元数据模型之间的映射以及权限模型之间的映射,从而可以提高用户配置映射关系的便利性。
在一种可能的实施方式中,管理装置还可以响应用户在配置界面上执行的第三操作,生成针对第二元数据模型以及第二权限模型的访问控制策略,以便利用该访问控制策略实现对计算引擎的访问操作进行约束。如此,可以通过界面交互的方式提高用户配置的便利性。
在一种可能的实施方式中,数据处理系统包括多种计算引擎,该管理装置包括该多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。如此,管理装置可以利用该多种计算引擎降低对接计算引擎的难度,从而可以进一步提高计算引擎的扩展性。
在一种可能的实施方式中,管理装置在根据第一映射关系确定目标数据的元数据时,具体可以是先根据访问请求读取第一元数据,该第一元数据满足第一元数据模型,然后,管理装置再根据第一映射关系,将第一元数据翻译为满足第二元数据模型的第二元数据(也即前述目标数据的元数据);而在对访问请求进行鉴权时,管理装置具体可以是先根据第二映射关系,将访问请求中的第一权限信息翻译为满足第一权限模型的第二权限信息,该第一权限信息满足第二权限模型,然后再对该第二权限信息进行鉴权;这样,管理装置具体可以是在第二权限信息通过鉴权后,将第二元数据发送给计算引擎。如此,管理装置可以基于第一映射关系为计算引擎确定其所需的元数据,基于第二映射关系为计算引擎进行鉴权,以此实现管理装置与计算引擎之间的适配,从而可以提高数据处理系统对接计算引擎的扩展性。
第二方面,本申请提供了一种数据处理系统,该数据处理系统包括计算引擎、管理装置、存储装置,管理装置内置有第一元数据模型以及第一权限模型,第一元数据模型以及
第一权限模型与存储装置适配;计算引擎,用于生成针对目标数据的元数据的访问请求,并将访问请求发送给管理装置;管理装置,用于响应访问请求,根据第一映射关系,确定目标数据的元数据,并第二映射关系对访问请求进行鉴权;在访问请求通过鉴权后,将目标数据的元数据发送给计算引擎,其中,第一映射关系为第一元数据模型与第二元数据模型之间的映射关系,第二元数据模型与计算引擎适配,目标数据的元数据满足第二元数据模型,第二映射关系为第一权限模型与第二权限模型之间的映射关系;计算引擎,还用于根据目标数据的元数据,读取存储装置存储的目标数据。
在一种可能的实施方式中,计算引擎,还用于生成元数据更新请求,并将元数据更新请求发送给管理装置;管理装置,还用于响应元数据更新请求,根据第一映射关系,将元数据更新请求中携带的原始元数据翻译为目标元数据,该原始元数据满足第二元数据模型,目标元数据满足第一元数据模型,并根据第二映射关系对元数据更新请求进行鉴权,在元数据更新请求通过鉴权后,将目标元数据更新至管理装置。
在一种可能的实施方式中,管理装置,还用于输出配置界面,并响应用户在配置界面执行的第一操作,建立第一映射关系,以及响应用户在配置界面执行的第二操作,建立第二映射关系。
在一种可能的实施方式中,管理装置,还用于响应用户在配置界面执行的第三操作,生成针对第二元数据模型以及第二权限模型的访问控制策略。
在一种可能的实施方式中,数据处理系统包括多种计算引擎,管理装置包括多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
在一种可能的实施方式中,目标数据的元数据为第二元数据,则管理装置,具体用于根据访问请求读取第一元数据,第一元数据满足第一元数据模型,根据第一映射关系将第一元数据翻译为满足第二元数据模型的第二元数据,根据第二映射关系,将访问请求中的第一权限信息翻译为满足第一权限模型的第二权限信息,该第一权限信息满足第二权限模型,并对第二权限信息进行鉴权;并在第二权限信息通过鉴权后,将第二元数据发送给计算引擎。
第三方面,本申请提供了一种管理装置,该管理装置应用于数据处理系统,数据处理系统还包括计算引擎、存储装置,管理装置内置有第一元数据模型以及第一权限模型,第一元数据模型以及第一权限模型与存储装置适配;管理装置包括:交互模块,用于接收计算引擎发送的针对目标数据的元数据的访问请求,该目标数据存储于所述存储装置;元数据确定模块,用于响应访问请求,根据第一映射关系确定目标数据的元数据,第一映射关系为第一元数据模型与第二元数据模型之间的映射关系,第二元数据模型与计算引擎适配,目标数据的元数据满足第二元数据模型;鉴权模块,用于根据第二映射关系对访问请求进行鉴权,第二映射关系为第一权限模型与第二权限模型之间的映射关系,第二权限模型与计算引擎适配;交互模块,还用于在访问请求通过鉴权后,将目标数据的元数据发送给计算引擎。
在一种可能的实施方式中,交互模块还用于管理装置接收计算引擎发送的元数据更新请求;元数据确定模块,还用于响应元数据更新请求,根据第一映射关系,将元数据更新请求中携带的原始元数据翻译为目标元数据,该原始元数据满足第二元数据模型,目标元
数据满足第一元数据模型;鉴权模块,还用于根据第二映射关系对元数据更新请求进行鉴权;交互模块,还用于在元数据更新请求通过鉴权后,将目标元数据更新至管理装置。
在一种可能的实施方式中,交互模块,还用于输出配置界面;管理装置还包括:映射模块,用于响应用户在配置界面执行的第一操作,建立第一映射关系,并响应用户在配置界面执行的第二操作,建立第二映射关系。
在一种可能的实施方式中,映射模块,还用于响应用户在配置界面执行的第三操作,生成针对第二元数据模型以及第二权限模型的访问控制策略。
在一种可能的实施方式中,数据处理系统包括多种计算引擎,管理装置包括多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
在一种可能的实施方式中,目标数据的元数据为第二元数据,则元数据确定模块,具体用于根据访问请求读取第一元数据,第一元数据满足第一元数据模型,根据第一映射关系,将第一元数据翻译为满足第二元数据模型的第二元数据;鉴权模块,具体用于根据第二映射关系,将访问请求中的第一权限信息翻译为满足第一权限模型的第二权限信息,第一权限信息满足第二权限模型,并对第二权限信息进行鉴权;交互模块,具体用于在第二权限信息通过鉴权后,将第二元数据发送给计算引擎。
值得注意的是,第三方面提供的管理装置,对应于第一方面提供的数据处理方法,故第三方面以及第三方面中任一实施方式所具有的技术效果,可参见第一方面或者第一方面的相应实施方式所具有的技术效果。
第四方面,本申请提供一种计算设备集群,所述计算设备包括至少一个计算设备,所述至少一个计算设备包括至少一个处理器和至少一个存储器;所述至少一个存储器用于存储指令,所述至少一个处理器执行所述至少一个存储器存储的该指令,以使所述计算设备集群执行上述第一方面或第一方面任一种可能实现方式中的数据处理方法。需要说明的是,该存储器可以集成于处理器中,也可以是独立于处理器之外。所述至少一个计算设备还可以包括总线。其中,处理器通过总线连接存储器。其中,存储器可以包括可读存储器以及随机存取存储器。
第五方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行上述第一方面或第一方面的任一种实现方式所述的方法。
第六方面,本申请提供了一种包含指令的计算机程序产品,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行上述第一方面或第一方面的任一种实现方式所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本申请提供的一示例性数据处理系统的结构示意图;
图2为本申请提供的管理装置102内置的元数据模型1的数据结构示意图;
图3为本申请提供的管理装置102内置的权限模型1的示意图;
图4为本申请提供的一示例性访问控制策略示意图;
图5为本申请提供的一种与计算引擎1012适配的元数据模型2的数据结构示意图;
图6为本申请提供的另一种与计算引擎1012适配的元数据模型2的数据结构示意图;
图7为本申请提供的一种与计算引擎1012适配的权限模型2的示意图;
图8为本申请提供的另一种与计算引擎1012适配的权限模型2的示意图;
图9为本申请提供的元数据模型1与元数据模型2之间建立映射的示意图;
图10为本申请提供的一种权限模型1与权限模型2之间建立映射的示意图;
图11为本申请提供的另一种权限模型1与权限模型2之间建立映射的示意图;
图12为本申请提供的一种针对元数据模型2所定义的访问控制策略的示意图;
图13为本申请提供的另一种针对元数据模型2所定义的访问控制策略的示意图;
图14为本申请提供的一示例性数据处理方法的流程示意图;
图15为本申请提供的一种计算设备的结构示意图;
图16为本申请提供的一种计算设备集群的结构示意图。
下面将结合本申请中的附图,对本申请提供的实施例中的方案进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。
参见图1,为一示例性数据处理系统的结构示意图。如图1所示,数据处理系统100包括计算装置101、管理装置102以及存储装置103,并且,计算装置101、管理装置102以及存储装置103之间可以通过网络进行通信连接。
计算装置101,可以包括一种或者多种计算引擎,如结构化查询语言(structured query language,SQL)计算引擎、人工智能(artificial intelligence,AI)计算引擎、第三方计算引擎,可以是开源社区的计算引擎,也可以是云厂商推出的商业用途的计算引擎。以SQL计算引擎为例,SQL计算引擎具体可以是Presto引擎、Hive引擎、Spark引擎、Clickhouse引擎等,并且,这些类型的计算引擎均存在开源社区的版本以及用于商业用途的版本。为便于理解,图1中是以计算装置101包括计算引擎1011、计算引擎1012为例进行说明,计算引擎1011与计算引擎1012属于不同类型的计算引擎,在其它实施例中,计算装置101可以包括任意数量以及任意类型的计算引擎。计算装置101,用于通过其包括的计算引擎实现对存储装置103中数据的读写。
管理装置102,内置有固定的元数据模型1以及权限模型1,该元数据模型1与权限模型1与存储装置103适配,从而管理装置102能够利用该元数据模型1对存储装置103中存储的数据所对应的元数据进行管理,以及利用该权限模型1对针对该元数据的处理操作进行鉴权。其中,元数据,用于描述存储装置103中存储的数据的属性信息,如描述数据所属目录、所属数据库、数据的存储位置、存储格式、所采用压缩算法、所属分区(partition)等属性信
息。
存储装置103,用于持久化存储数据,如存储一个或者多个用户上传的数据等。实际应用时,存储装置103可以基于数据块格式(data block format)、文件(file)格式或者对象(object)格式进行存储等,或者采用列式存储、消息队列等其它存储方式,本实施例对此并不进行限定。
计算装置101中的计算引擎(如SQL计算引擎1011或者AI计算引擎1012)在需要访问存储装置103中的数据时,通常需要先通过管理装置102获取该数据对应的元数据,以便基于该元数据实现对存储装置103中数据的访问。基于此,当在计算装置101中部署计算引擎时,要求该计算引擎与管理装置102内置的元数据模型1以及权限模型1进行适配,否则,计算引擎无法通过管理装置102与存储装置103实现交互。因此,实际应用场景中,一部分计算引擎可能执行较为复杂的适配过程才能实现与管理装置102的对接,适配难度较高、耗时较长,甚至存在一部分计算引擎无法适配管理装置102内置的元数据模型1以及权限模型1,这些限制了数据处理系统100中的管理装置102对接计算引擎的扩展性。
为此,本申请通过为管理装置102中的元数据模型1以及权限模型1分别映射计算引擎所能适配的元数据模型2以及权限模型2,实现计算引擎与管理装置102的对接。具体实现时,如图1所示,可以在管理装置102中部署计算引擎所能适配的元数据模型2以及权限模型2,并建立元数据模型2与元数据模型1之间的映射关系1、建立权限模型2与权限模型1之间的映射关系2。这样,计算装置102中的计算引擎,可以通过与其适配的元数据模型2与管理装置102内置的元数据模型1间接实现适配,通过与其适配的权限模型2与管理装置102内置的权限模型1间接实现适配,从而计算引擎能够实现对管理装置102的访问。如此,不仅可以摆脱管理装置102内置的模型对于计算引擎的适配限制,提高数据处理系统100对接计算引擎的扩展性;而且,仅需在管理装置102中部署与计算引擎适配的元数据模型2以及权限模型2,并建立这些模型与管理装置102内置的模型之间的映射,即可实现计算引擎与管理装置102的适配,从而可以有效降低计算引擎的适配难度、减小适配耗时。
作为一种示例,上述数据处理系统100可以被部署于云端,用于为用户提供数据处理的云服务,如数据计算、数据存储的云服务等。此时,数据处理系统100中的计算装置101、管理装置102以及存储装置103,分别可以由云端的计算设备或者计算设备集群实现;计算装置101、管理装置102以及存储装置103也可以部署于同一计算设备,或者部署于同一计算设备集群中。作为另一种示例,上述数据处理系统100可以被部署于本地,从而可以为用户提供本地的数据处理服务。
实际应用时,上述数据处理系统100中的计算装置101、管理装置102分别可以通过软件实现,或者可以通过硬件实现。
以计算装置101为例,计算装置101作为软件功能单元的一种举例,可以包括运行在计算实例上的代码。其中,计算实例可以包括主机、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,计算装置101可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ
中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。
计算装置101作为硬件功能单元的一种举例,计算装置101可以包括至少一个计算设备,如服务器等。或者,计算装置101也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)、数据处理单元(Data processing unit,DPU)或其任意组合实现。
当计算装置101可以包括多个计算设备时,该多个计算设备可以分布在相同的region中,也可以分布在不同的region中。计算装置101包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,计算装置101包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
管理装置102与计算装置101类似,当通过软件实现时,管理装置102可以是运行在计算实例上的代码;当通过硬件实现时,管理装置102可以包括一个或者多个计算设备。
存储装置103通过硬件实现,可以包括至少一个具有数据存储能力的存储设备,例如可以包括一个或者多个存储服务器,或者可以包括具有持久化存储介质的设备等,该持久化存储设备例如可以是硬盘等(例如SSD、HDD)。
值得注意的是,上述图1所示的数据处理系统100仅作为一种示例性说明,实际应用时,数据处理系统100也可以具有其它实现方式。比如,数据处理系统100中管理装置102可以包括更多数量或者更多类型的元数据模型以及权限模型;或者,与计算装置101中的计算引擎适配的元数据模型2以及权限模型2也可以是配置于管理装置102外部,如配置于计算装置101中等,本实施例对此并不进行限定。
为便于理解,接下来对数据处理系统100中的数据处理过程的各种非限定性的具体实施方式进行详细描述。
初始状态下,数据处理系统100中的管理装置102中固定配置有元数据模型1以及权限模型1,该元数据模型1以及权限模型1例如可以是与开源的Hive引擎所适配的元数据模型以及权限模型。
作为一种示例,管理装置102内置的元数据模型1例如采用如图2所示的元数据结构。如图2所示,该元数据模型中包括目录(catalog)、数据库(database)、数据表(table)、函数(function)、列(column)、分区(partition)、行(row)、位置(location)信息等信息。
其中,目录是存储装置103中顶级的数据结构资源,即最大的一个命名空间。通常情况下,一个目录可以包括N1个数据库,N1为自然数。
数据库,是目录的下一级别的数据结构资源。数据库的下级数据结构资源包括数据表和函数,并且,一个数据库下可以包括N2个数据表以及N3个函数,N2以及N3均为自然数。
数据表,通常包括视图(view)和索引(index),是数据库的下级数据结构资源,该数据表的下级数据结构资源包括两个维度,一个维度是纵向的数据组织“列”,另一个维度是横向的数据组织“分区”或者“行”。其中,数据表以及其下级数据结构资源“分区”,均与“位置”存在映射,其中,“位置”用于指示数据表以及分区的底层数据在存储装置103中的存储位置。
函数,包括内置函数以及自定义标量函数(user defined scalar function,UDF),是数据库的下级数据结构资源,与“位置”存在映射,该“位置”也用于指示函数的实现类(如Java实现类等)的软件包(JAR包)的底层数据存储位置。
实际应用时,元数据模型1也可以是采用其它与存储装置103适配的元数据结构,本实施例对此并不进行限定。
作为一种示例,管理装置102内置的权限模型1例如可以如图3所示。
其中,目录对应的权限可以包括“全部(all)”、“创建数据库(creat database)”、“修改(alter)”、“创建目录(creat catalog)”、“列举所有数据库(list all database)”等。
数据库对应的权限可以包括:“全部”、“创建数据表(create table)”、“修改”、“删除(drop,删除结构)”、“描述(describe)”、“列举数据表(list table)”、“列举函数(list function)”、“列举数据库(list database)”、“列举所有数据表(list all database)”等。
数据表对应的权限可以包括:“全部”、“修改”、“删除(结构)”、“描述”、“更新(update)”、“插入(insert)”、“删除(delete,删除数据)”、“查询(select)”等。
列对应的权限可以包括:“全部”、“查询”等。
函数对应的权限可以包括:“全部”、“创建(creat)”、“执行(execucte)”、“删除(结构)”等。
位置对应的权限可以包括:“读(read)”、“写(write)”等。
进一步地,管理装置102中,还可以包括针对该元数据模型1以及权限模型1的访问控制策略,用于指示计算引擎调用应用程序编程接口(application programming interface,API)对管理装置102中的元数据进行访问所需具备的权限内容。示例性地,该访问控制策略例如可以如图4所示。
其中,对目录执行列举(list)操作,即在对指定的一个或者多个目录或全部目录进行列举查询时,所需具备的权限是“列举”权限或“全部”权限。
对目录执行创建(creat)操作,即在创建一个已定义的目录时,所需具备的权限是全局级的“创建目录”权限。
对数据库执行列举操作,即在对全部数据库进行列举查询,需要具备针对该数据库所属目标的“列举所有数据库”权限。
对数据库执行创建操作,即在创建一个已定义的数据库时,需要具备针对该数据库所属目录的“创建数据库”权限。
对数据表执行描述(describe)操作,即在对指定的一个或者多个数据表进行查询时,需要具备针对该数据表的“描述数据表”权限。
对数据库执行创建操作,即在创建一个已定义的数据表时,需要具备针对该数据表所属数据库的“创建数据表”权限。
对数据库执行列举操作,即在对全部数据表进行列举查询时,需要具备针对该数据表所属的数据库的“列举所有数据表”权限。
对数据表执行修改(alter)操作,即在对指定的一个或者多个数据表进行修改时,需要具备针对该数据表的“修改数据表”权限。
对权限策略执行列举操作,即在对全部权限策略进行列举查询时,需要具备全局级的“列举权限策略”的权限,其中,权限策略指示了所允许执行的操作、操作对象以及请求执行操作的主体;
对权限策略执行创建操作,即在创建一个已经定义的权限策略时,需要具备全局级的“创建权限策略”的权限。
在计算装置101中部署计算引擎时,假设所部署的计算引擎1011能够与该元数据模型1以及权限模型1适配,如计算引擎1011为Hive引擎等,则计算引擎1011能够基于该元数据模型1以及权限模型1实现对管理装置102的访问,从而可以无需额外的配置操作使得计算引擎1011与管理装置102实现对接。
而当在计算装置101中部署计算引擎1012时,该计算引擎1012与管理装置102内置的元数据模型1以及权限模型1不适配,如计算引擎1012可以是Presto引擎等,此时,通过在管理装置102中部署与该计算引擎1012相适配的元数据模型2以及权限模型2,并建立元数据模型2与元数据模型1之间的映射关系1,建立权限模型2与权限模型1之间的映射关系2。
其中,所部署的元数据模型2以及权限模型2可以是用户自定义的模型,或者可以是已知的与该计算引擎1012适配的模型。
示例性例的,用户可以请求在数据处理系统100中部署计算引擎1012,并向数据处理系统100提供针对该计算引擎1012所自定义的元数据模型2以及权限模型2。
比如,用户自定义的元数据模型可以采用如图5所示的元数据结构,该元数据模型2包括目录、模式(schema)、数据表、UDF(自定义标量函数)、列、分区、行、位置。或者,用户自定义的元数据模型可以采用如图6所示的元数据结构,该元数据模型2包括数据库、数据表、视图、行、位置。
并且,用户还可以定义图7所示的权限模型2。此时,目录对应的权限可以包括“全部”或者“管理员(administrator)”。模式对应的权限包括“使用(use)”、“创建”、“删除(结构)”等。UDF对应的权限包括“全部”、“创建”、“删除(结构)”、“修改”、“查询”等。数据表对应的权限包括“全部”、“查询”、“插入”、“删除(数据)”、“修改”、“更新”等(其余元数据对应的权限未示出)。
或者定义如图8所示的权限模型2等。此时,数据库对应的权限可以包括“创建”。视图对应权限可以包括“查询”、“创建”等。数据表对应的权限可以包括“查询”、“插入”等(其余元数据对应的权限未示出)。
然后,用户可以定义元数据模型2与管理装置102内置的元数据模型1之间的映射关系1。
具体实现时,数据处理系统100可以对外呈现客户端,该客户端例如可以是运行在用户侧设备上的应用程序,或者可以是数据处理系统100对外提供的网络浏览器等。管理装置102
中可以包括交互模块1021以及映射模块1022,其中,交互模块1021可以向该客户端输出配置界面,并由客户端将该配置界面呈现给用户。这样,用户可以在该配置界面上执行第一操作,建立元数据模型2与元数据模型1之间的映射关系1,具体可以是交互模块1021将用户执行的第一操作反馈给映射模块1022,由映射模块1022根据该第一操作将元数据模型2中的某一元数据层级与元数据模型1中的同等元数据层级建立映射。如图9中的虚线所示,对于图5所示的元数据模型2,映射模块1022可以基于用户执行的第一操作,将元数据模型2中的“模式”层级与元数据模型1中的“数据库”层级建立映射,或者可以将元数据模型2中的“UDF”层级与元数据模型1中的“函数”层级建立映射。当元数据模型2采用图6所示的元数据结构时,映射模块1022可以基于用户执行的第一操作,将元数据模型2中的“数据库”层级分别与元数据模型1中的“目录”层级以及“数据库”层级建立映射,或者可以将元数据模型2中的“视图”层级与元数据模型1中的“数据表”层级建立映射。然后,映射模块1022基于已经映射的层级,将两个元数据模型中的其余元数据层级也进行映射。
由于自定义的元数据模型2与管理装置102内置的元数据模型1之间除了可能在元数据层级的命名存在区别之外,在元数据结构上也可能存在区别。因此,映射模块1022在建立映射关系1的过程中,对于元数据模型2中具有而元数据模型1没有的属性,可以直接将该属性映射至元数据模型1中已有的某个属性(也即已有的某个元数据层级,下同);对于元数据模型2中没有而元数据模型1中具有的属性,可以将该属性映射至元数据模型2中已有的某个属性。
并且,映射模块1022还建立权限模型2与管理装置102内置的权限模型1之间的映射关系2。
具体实现时,用户可以在客户端呈现的配置界面上执行针对权限模型之间的映射关系的第二操作。交互模块1021可以将该第二操作反馈给映射模块1022,并由映射模块1022根据该第二操作,将元数据模型2中各个元数据层级对应的操作权限,与元数据模型1中各个元数据层级对应的操作权限建立映射关系2。
比如,如图10所示,映射模块1022可以将图7所示的权限模型2与权限模型1建立映射关系。其中,针对权限模型2中目录的操作权限“全部”、“管理员”可以与权限模型1中的操作权限“全部”、“创建数据库”、“修改”等建立映射,并且,权限模型2中针对目录的操作权限与权限模型1中针对目录的操作权限之间,可以建立一对一的映射关系、多对一的映射关系、或者一对多的映射关系,具体可以是根据实际应用的需求进行设定,本实施例对此并不进行限定,其余元数据层级对应的操作权限之间的映射关系也可采用类似的方式建立。
又比如,如图11所示,映射模块1022可以将图8所示的权限模型2与权限模型1建立映射关系。其中,针对权限模型2中数据表的操作权限“查询”、“插入”可以与权限模型1中的操作权限“查询”、“插入”、“删除(结构)”、“描述”等操作权限建立映射关系,并且,可以根据实际应用的需求映射权限模型1中的全部操作权限,也可以仅映射部分操作权限,如可以仅建立权限模型2中数据表的操作权限“查询”、“插入”与权限模型1中的操作权限“查询”、“插入”之间的一一映射关系等。其余元数据层级对应的操作权限之间的映射关系也可采用类似的方式建立。
实际应用时,除了配置界面之外,用户也可以是通过API接口、软件开发工具包(software
development kit,SDK)、命令行、模板导入等方式向管理装置102提供元数据模型2、权限模型2以及模型之间的映射关系,本实施例对此并不进行限定。
在进一步可能的实施方式中,用户还可以为所要部署的计算引擎1012定义调用元数据模型2以及权限模型2的访问控制策略。具体实现时,用户可以在客户端呈现的配置界面上执行针对该访问控制策略的第三操作。交互模块1021可以将该第三操作反馈给映射模块1022,并由映射模块1022根据该第三操作,创建相应的访问控制策略,具体可以是针对元数据模型2中每一元数据层级的增/删/改/查的API,基于该元数据模型2对应的权限模型2,定义该API在该元数据层级或该元数据层级的上一层级的权限要求,以生成对应的访问控制策略。进一步地,映射模块1022还可以针对权限模型2对应的权限策略的增/删/改/查的API,基于该权限模型2,定义该API的权限要求,以生成相应的访问控制策略。
举例来说,当元数据模型2采用图5所示的元数据结构时,映射模块1022基于第三操作所生成的访问控制策略可以如图12所示。
其中,对目录执行列举操作,即在对指定的一个或者多个目录或全部目录进行列举查询时,所需具备的权限是“全部”权限或“管理员”权限。
对目录执行创建操作,在创建一个已定义的目录时,所需具备的权限是全局级的“管理员”权限。
对模式执行列举操作,即在对指定的一个或者多个模式或全部模式进行列举查询,需要具备针对该模式所属目录的“全部”权限、“使用”权限。
对模式执行创建操作,即在创建一个已定义的模式时,需要具备针对该模式所属目录的“创建”权限。
对UDF执行描述操作,即对指定的一个或者多个UDF进行查询时,需要具备针对该UDF的“查询”权限。
对UDF执行创建操作,即在创建一个已定义的UDF时,需要具备针对该UDF所属模式的“创建”权限。
对数据表执行列举操作,即在对全部的数据表进行列举查询时,需要具备针对该数据表所属的模式的“查询”权限。
对数据表执行修改操作,即在对指定的一个或者多个数据表进行修改时,需要具备针对该数据表的“修改”权限。
又比如,当元数据模型2采用图6所示的元数据结构时,映射模块1022基于第三操作所生成的访问控制策略可以如图13所示。
对数据库执行列举操作,即在对指定的一个或者多个数据库或全部数据库进行列举查询时,需要具备的权限是全局级别的“创建”权限。
对数据表执行创建操作,即在创建一个已定义的数据库时,需要具备是全局级的“创建”权限。
对数据库执行描述操作,即在对指定的一个或者多个数据表进行描述查询时,需要具备针对该数据表的“查询”权限。
对数据表执行创建操作,即在创建一个已定义的数据表时,需要具备针对该数据表所属数据库的“创建”权限。
对数据表执行列举操作,即在对指定的一个或者多个数据表进行查询时,针对该数据表所属数据库需要具备的权限是“查询”权限。
对数据表执行修改操作,即在对指定的一个或者多个数据表进行修改时,需要具备针对该数据表的“修改”权限。
对视图执行列举操作,即在对全部视图进行列举查询时,需要具备针对该视图所属的数据库的“创建”权限。
对视图执行描述操作,即在对指定的一个或者多个数据表进行查询时,需要具备针对该数据表的“描述”权限。
实际应用时,映射模块1022基于用户执行的第三操作所定义的访问控制策略也可以是其它实现方式的访问控制策略,本实施例对此并不进行限定。
管理装置102在配置好元数据模型2、权限模型2以及模型之间的映射关系后(以及访问控制策略),可以为计算引擎1012提供访问存储装置103中的数据所对应的元数据,或者将计算引擎1012向存储装置103中写入的数据对应的元数据进行保存。下面分别对这两个过程进行详细介绍。
当计算引擎1012需要读取存储装置103中的存储的数据(以下称之为目标数据)时,计算引擎1012可以生成针对所要读取的目标数据的元数据的访问请求,并将该访问请求发送给管理装置102,具体可以是调用管理装置102中的元数据模型2以及权限模型2的API接口,从而管理装置102可以通过该API接口接收访问请求。
管理装置102中包括交互模块1021、元数据确定模块1023以及鉴权模块1024,如图1所示。其中,交互模块1021在接收到计算引擎1012发送的访问请求后,可以将该访问请求提供给元数据确定模块1023以及鉴权模块1024。元数据确定模块1023响应该访问请求,根据元数据模型2以及元数据模型1,确定目标数据对应的元数据,并将该元数据反馈给交互模块1021。并且,鉴权模块1024根据权限模型2以及权限模型1对访问请求进行鉴权,并向交互模块1021反馈鉴权结果。当鉴权结果指示该访问请求通过鉴权时,交互模块1021将目标数据对应的元数据发送给该计算引擎1012。
在一种可能的实施方式中,交互模块1021所接收到的访问请求中携带有针对目标数据的指示信息、执行操作以及计算引擎1012的标识。这样,元数据确定模块1023在响应访问请求时,具体可以是根据访问请求携带的指示信息确定计算引擎1012所要访问的元数据,为便于区分,以下称之为第一元数据。由于第一元数据满足元数据模型1,并且该元数据模型1与计算引擎1012并不适配,因此,元数据确定模块1023可以根据元数据模型1与元数据模型2之间的映射关系1,将第一元数据翻译为满足元数据模型2的第二元数据。可以理解,元数据模型2与计算引擎1012适配,因此,计算引擎1012能够基于元数据模型2所采用的数据结构识别第二元数据。
本实施例中,管理装置102在生成第二元数据后,并非直接将该第二元数据发送给计算引擎1012,而是在访问请求通过鉴权的情况下,才将第二元数据发送给计算引擎。具体实现时,管理装置102中的鉴权模块1024确定访问请求中携带的第一权限信息,该第一权限信息可以包括计算引擎1012的标识、所请求执行的操作(查询)以及目标数据的指示信息(也即指示了目标数据对应的元数据),并且,通常情况下,该第一权限信息为满足权限模型2
的权限信息。从而,鉴权模块1024可以根据权限模型2与权限模型1之间的映射关系2,将第一权限信息翻译为满足权限模型1的第二权限信息,并利用预先配置的权限策略(以及访问控制策略)对该第二权限信息进行鉴权,以确定计算引擎1012是否具有对该元数据执行该操作的权限,生成针对该访问请求的鉴权结果并将其反馈给交互模块1021。这样,当鉴权结果指示访问请求通过鉴权时,交互模块1021将元数据确定模块1023翻译得到的第二元数据发送给计算引擎1012;当鉴权结果指示访问请求未通过鉴权时,交互模块1021可以向计算引擎1012反馈用于指示请求失败或者鉴权失败的响应信息。
上述实现方式中,是以元数据确定模块1023以及鉴权模块1024并行执行生成第二元数据以及对访问请求进行鉴权的操作为例进行说明。在其它可能的实施方式中,也可以是由鉴权模块1024先对访问请求进行鉴权,并将鉴权结果反馈给元数据确定模块1023;元数据确定模块1023在确定鉴权结果指示访问请求通过鉴权的情况下,才执行生成第二元数据的过程。
计算引擎1012在获取到目标数据对应的第二元数据后,可以根据该第二元数据访问存储装置103中存储的目标数据。示例性地,计算引擎1012可以根据该第二元数据直接访问存储装置103,以获得所要读取的目标数据;或者,计算引擎1012可以根据该第二元数据,调用管理装置102对外提供的数据API接口,以便利用该数据API接口间接访问得到存储装置103中存储的目标数据,本实施例对此并不进行限定。
进一步地,计算引擎1012还可以请求对管理装置102中的权限策略进行访问。具体实现时,计算引擎1012可以生成针对管理装置102中的目标权限策略的访问请求,该访问请求包括计算引擎1012的标识、目标权限策略的指示信息以及针对该目标权限策略的操作(查询),并将该访问请求发送给交互模块1021,由交互模块1021将该访问请求提供给鉴权模块1024。鉴权模块1024可以基于针对权限策略的权限配置,对该访问请求进行鉴权,以确定计算引擎1012是否具有访问管理装置102中内置的目标权限策略的权限。从而,在确定该访问请求通过鉴权后,鉴权模块1024可以将目标权限策略翻译为满足权限模型2的权限策略,并将其反馈给交互模块1021,以便由交互模块1021将该权限策略发送给计算引擎1012。
当计算引擎1012需要向存储装置103中存储新数据时,计算引擎1012可以根据针对该新数据的存储计划,生成该新数据对应的元数据,为便于描述,以下称之为原始元数据,并且该原始元数据通常为满足元数据模型2的元数据。然后,计算引擎1012可以生成包括该原始元数据的元数据更新请求(还可以包括计算引擎1012的标识以及请求执行的操作),并将该元数据更新请求发送给管理装置102中的交互模块1021。
交互模块1021可以将接收到的元数据更新请求提供给鉴权模块1024以及元数据确定模块1023。
鉴权模块1024可以先对该元数据更新请求进行鉴权,具体可以是确定元数据更新请求中携带的第三权限信息,该第三权限信息可以包括计算引擎1012的标识、所请求执行的操作(如修改、创建等)以及原始元数据。从而,鉴权模块1024可以根据权限模型2与权限模型1之间的映射关系2,将第三权限信息翻译为满足权限模型1的第四权限信息,并利用预先配置的权限策略(以及访问控制策略)对该第四权限信息进行鉴权,以确定计算引擎1012是否具有对该原始元数据执行该操作的权限,生成针对该元数据更新请求的鉴权结果并将
其反馈给元数据确定模块1023。
并且,元数据确定模块1023在确定该鉴权结果指示元数据更新请求通过鉴权时,根据元数据模型2与元数据模型1之间的映射关系1,将元数据更新请求中携带的原始元数据翻译为满足元数据模型1的目标元数据,并将该目标元数据更新至管理装置102中,具体可以是在管理装置102中持久化存储该目标元数据。
上述实施方式中,是以元数据确定模块1023在确定元数据更新请求通过鉴权后执行原始元数据的翻译过程为例进行说明,在其它可能的实施方式中,元数据确定模块1023可以与鉴权模块1024并行执行元数据翻译以及请求鉴权的过程,本实施例对此并不进行限定。
在管理装置102持久化存储目标元数据之后,计算引擎1012可以根据该原始元数据或者目标元数据将新数据写入存储装置103中。其中,计算引擎1012可以直接访问存储装置103,并将新数据写入该存储装置103中;或者,计算引擎1012可以调用管理装置102对外提供的数据API接口,通过管理装置102将该新数据间接写入存储装置103中。
如此,在计算装置102中部署的计算引擎1012,可以利用与其适配的元数据模型2与权限模型2实现与管理装置102内置的元数据模型1与权限模型1的适配,从而计算引擎1012通过管理装置102实现对存储装置103的数据读写。
在进一步的应用场景中,计算装置101中包括多种计算引擎(如图1中的计算引擎1011以及计算引擎1012),并且,管理装置102中包括多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型,如上述元数据模型1、2以及权限模型1、2,并且,通过建立不同元数据模型之间的映射以及不同权限模型之间的映射,可以实现不同计算引擎之间对于数据的共享。以计算引擎1011以及计算引擎1012为例,计算引擎1012可以利用元数据模型2以及权限模型2向存储装置103中写入新数据,计算引擎1011可以利用元数据模型1以及权限模型1向存储装置103中写入新数据。由于元数据模型2与元数据模型1之间存在映射关系1,权限模型2与权限模型1之间存在映射关系2,因此,计算引擎1011可以基于元数据模型之间的映射关系1以及权限模型之间的映射关系2,对计算引擎1012所写入的数据(或者所创建的数据表)进行访问;计算引擎1012可以基于元数据模型之间的映射关系1以及权限模型之间的映射关系2,对计算引擎1011所写入的数据(或者所创建的数据表)进行访问。
参阅图14,为本申请实施例中一种数据处理方法的流程示意图。该方法可以应用于上述图1所示的数据处理系统100中,或者也可以是应用于其它可适用的应用场景中。下面以应用于图1所示的数据处理系统100为例进行说明。本实施例中,计算装置101所执行的操作,具体由该计算装置101中的计算引擎1012执行;管理装置102所执行的操作,具有该管理装置102中包括的多个功能模块执行。
图14所示的数据处理方法具体可以包括:
S1401:计算引擎1012生成针对目标数据的元数据的访问请求,并将该访问请求发送给交互模块1021。
S1402:交互模块1021将该访问请求分别提供给元数据确定模块1023以及鉴权模块1024。
S1403:元数据确定模块1023响应该访问请求,根据元数据模型1与元数据模型2之间的映射关系1,确定目标数据的元数据,并将该目标数据的元数据反馈给交互模块1021,其中,
元数据模型2与计算引擎1012适配,并且,所确定的目标数据的元数据满足该元数据模型2。
具体实现时,元数据模型1与元数据模型2之间存在映射关系1,从而元数据确定模块1023在根据该访问请求确定目标数据对应的元数据后,可以根据该映射关系1,将满足元数据模型1的数据结构的元数据翻译成满足元数据模型2的数据结构的元数据(也即目标数据的元数据)。其中,元数据模型1与元数据模型2之间的映射关系1,可以预先由管理装置102中的映射模块1022建立,其建立映射关系1的具体实现过程,可参见前述实施例的相关之处描述,在此不做重述。
S1404:鉴权模块1024根据权限模型1以及权限模型2之间的映射关系2,对访问请求进行鉴权,并将交互模块1021反馈鉴权结果,其中,权限模型2与计算引擎1012适配。
具体实现时,鉴权模块1024可以确定访问请求中携带的第一权限信息,该第一权限信息例如可以包括计算引擎1012的标识、所请求执行的操作以及目标数据的指示信息,并且,第一权限信息为满足权限模型2的权限信息。从而,鉴权模块1024可以根据权限模型2与权限模型1之间的映射关系2,将第一权限信息翻译为满足权限模型1的第二权限信息,并利用预先配置的权限策略对该第二权限信息进行鉴权,以确定计算引擎1012是否具有对该元数据执行该操作的权限,生成针对该访问请求的鉴权结果并将其反馈给交互模块1021。其中,权限模型1与权限模型2之间的映射关系2,可以预先由映射模块1022建立,其建立映射关系2的具体实现过程,可参见前述实施例的相关之处描述,在此不做重述。
S1405:交互模块1021在根据鉴权结果确定访问请求通过鉴权后,将目标数据的元数据发送给计算引擎1012。
S1406:计算引擎1012根据目标数据的元数据,读取存储装置103中存储的目标数据。
其中,计算引擎1012可以直接根据该元数据访问存储装置103,以读取存储装置103中的目标数据;或者,计算引擎1012可以调用管理装置102对外提供的数据API接口,通过管理装置102间接读取存储装置103中的目标数据。
需要说明的是,本实施例所示的各个步骤之间的执行顺序并不用于限定,在其它实施例中,不同步骤之间的执行顺序也可以采用其它实现方式。比如,在其它实施例中,步骤S1403与步骤S1404可以同时执行;或者,鉴权模块1024可以先对访问请求进行鉴权,并将生成的鉴权结果提供给元数据确定模块1023;然后,元数据确定模块1023在根据鉴权结果确定访问请求通过鉴权后,再将满足元数据模型1的元数据翻译为满足元数据模型2的元数据。
另外,本实施例提供的数据处理方法对应于上述图1所示的数据处理系统100,因此,步骤S1401至步骤S1406的具体实现过程,可参见前述实施例的相关之处描述,在此不做赘述。
本实施例中,是以计算引擎1012读取存储装置103中的目标数据为例进行说明,当计算引擎1012向存储装置103中写入新数据时,首先,计算引擎1012可以根据针对该新数据的存储计划,生成该新数据对应的原始元数据,该原始元数据满足与计算引擎1012适配的元数据模型2。然后,计算引擎1012可以生成包括该原始元数据的元数据更新请求(还可以包括计算引擎1012的标识以及请求执行的操作),并将该元数据更新请求发送给管理装置102中的交互模块1021。交互模块1021可以将接收到的元数据更新请求提供给鉴权模块1024以及
元数据确定模块1023。接着,鉴权模块1024可以先对该元数据更新请求进行鉴权,具体可以是确定元数据更新请求中携带的第三权限信息,该第三权限信息可以包括计算引擎1012的标识、所请求执行的操作(如修改、创建等)以及原始元数据,并且,第三权限模型满足权限模型2。然后,鉴权模块1024可以根据权限模型2与权限模型1之间的映射关系2,将第三权限信息翻译为满足权限模型1的第四权限信息,并利用预先配置的权限策略对该第四权限信息进行鉴权,以确定计算引擎1012是否具有对该原始元数据执行该操作的权限,生成针对该元数据更新请求的鉴权结果并将其反馈给元数据确定模块1023。接着,元数据确定模块1023在确定该鉴权结果指示元数据更新请求通过鉴权时,根据元数据模型2与元数据模型1之间的映射关系1,将元数据更新请求中携带的原始元数据翻译为满足元数据模型1的目标元数据,并将该目标元数据更新至管理装置102中,具体可以是在管理装置102中持久化存储该目标元数据。最后,计算引擎1012可以根据该原始元数据或者目标元数据将新数据写入存储装置103中。
上述图1至图14所示实施例中,针对数据处理过程中所涉及到的管理装置102(包括上述交互模块1021、映射模块1022、元数据确定模块1023以及鉴权模块1024)可以是配置于计算设备或者计算设备集群上的软件,并且,通过在计算设备或者计算设备集群上运行该软件,可以使得计算设备或者计算设备集群实现上述管理装置102所具有的功能。下面,基于硬件设备实现的角度,对数据处理的过程中所涉及的管理装置102进行详细介绍。
图15示出了一种计算设备的结构示意图,上述管理装置102可以部署在该计算设备上,该计算设备可以是云环境中的计算设备(如服务器),或边缘环境中的计算设备,或终端设备等具体可以用于实现上述图1所示实施例中交互模块1021、映射模块1022、元数据确定模块1023以及鉴权模块1024的功能。
如图15所示,计算设备1500包括处理器1510、存储器1520、通信接口1530和总线1540。处理器1510、存储器1520和通信接口1530之间通过总线1540通信。总线1540可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图15中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口1530用于与外部通信,例如获取访问请求、反馈目标数据的元数据等。
其中,处理器1510可以为中央处理器(central processing unit,CPU)、专用集成电路(application specific integrated circuit,ASIC)、图形处理器(graphics processing unit,GPU)或者一个或多个集成电路。处理器1510还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,管理装置102中各个模块的功能可以通过处理器1510中的硬件的集成逻辑电路或者软件形式的指令完成。处理器1510还可以是通用处理器、数据信号处理器(digital signal process,DSP)、现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件,分立门或者晶体管逻辑器件,分立硬件组件,可以实现或者执行本申请实施例中公开的方法、步骤及逻辑框图。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,结合本申请实施例所公开的方法可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以
位于随机存储器、闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1520,处理器1510读取存储器1520中的信息,结合其硬件完成管理装置102中的部分或全部功能。
存储器1520可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。存储器1520还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,HDD或SSD。
存储器1520中存储有可执行代码,处理器1510执行该可执行代码以执行前述管理装置102所执行的方法。
具体地,在实现图1所示实施例的情况下,且图1所示实施例中所描述的交互模块1021、映射模块1022、元数据确定模块1023以及鉴权模块1024为通过软件实现的情况下,执行图1中的交互模块1021、映射模块1022、元数据确定模块1023以及鉴权模块1024的功能所需的软件或程序代码存储在存储器1520中,交互模块1021与其它设备的交互通过通信接口1530实现,处理器用于执行存储器1520中的指令,实现管理装置102所执行的方法。
图16示出的一种计算设备集群的结构示意图。其中,图16所示的计算设备集群160包括多个计算设备,上述管理装置102可以分布式地部署在该计算设备集群160中的多个计算设备上。如图16所示,计算设备集群160包括多个计算设备1600,每个计算设备1600包括存储器1620、处理器1610、通信接口1630以及总线1640,其中,存储器1620、处理器1610、通信接口1630通过总线1640实现彼此之间的通信连接。
处理器1610可以采用CPU、GPU、ASIC或者一个或多个集成电路。处理器1610还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,管理装置102的部分功能可用通过处理器1610中的硬件的集成逻辑电路或者软件形式的指令完成。处理器1610还可以是DSP、FPGA、通用处理器、其他可编程逻辑器件,分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中公开的部分方法、步骤及逻辑框图。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器、闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1620,在每个计算设备1600中,处理器1610读取存储器1620中的信息,结合其硬件可以完成管理装置102的部分功能。
存储器1620可以包括ROM、RAM、静态存储设备、动态存储设备、硬盘(例如SSD、HDD)等。存储器1620可以存储程序代码,例如,用于实现交互模块1021的部分或者全部程序代码、用于实现映射模块1022的部分或者全部程序代码、用于实现元数据确定模块1023的部分或者全部程序代码、用于实现鉴权模块1024的部分或者全部程序代码等。针对每个计算设备1600,当存储器1620中存储的程序代码被处理器1610执行时,处理器1610基于通信接口1630执行管理装置102所执行的部分方法,如其中一部分计算设备1600可以用于执行上述交互模块1021所执行的方法,一部分计算设备1600可以用于执行上述映射模块1022所执行的方法、一部分计算设备1600用于执行上述元数据确定模块1023所执行的方法、一部分计算设备1600用于执行上述鉴权模块1024所执行的方法。存储器1620还可以存储数据,
例如:处理器1610在执行过程中产生的中间数据或结果数据,例如,上述第一元数据、第二元数据、第一权限信息、第二权限信息等。
每个计算设备1600中的通信接口1603用于与外部通信,例如与其它计算设备1600进行交互等。
总线1640可以是外设部件互连标准总线或扩展工业标准结构总线等。为便于表示,图16中每个计算设备1600内的总线1640仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
上述多个计算设备1600之间通过通信网络建立通信通路,以实现管理装置102的功能。任一计算设备可以是云环境中的计算设备(例如,服务器),或边缘环境中的计算设备,或终端设备。
此外,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在一个或者多个计算设备上运行时,使得该一个或者多个计算设备执行上述实施例管理装置102的各个模块所执行的方法。
此外,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品被一个或者多个计算设备执行时,所述一个或者多个计算设备执行前述数据处理方法中的任一方法。该计算机程序产品可以为一个软件安装包,在需要使用前述数据处理方法的任一方法的情况下,可以下载该计算机程序产品并在计算机上执行该计算机程序产品。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是
通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
Claims (20)
- 一种数据处理方法,其特征在于,所述方法应用于数据处理系统,所述数据处理系统包括计算引擎、管理装置以及存储装置,所述管理装置内置有第一元数据模型以及第一权限模型,所述第一元数据模型以及所述第一权限模型与所述存储装置适配,所述方法包括:所述管理装置接收所述计算引擎发送的针对目标数据的元数据的访问请求,所述目标数据存储于所述存储装置;所述管理装置响应所述访问请求,根据第一映射关系确定所述目标数据的元数据,所述第一映射关系为所述第一元数据模型与第二元数据模型之间的映射关系,所述第二元数据模型与所述计算引擎适配,所述目标数据的元数据满足所述第二元数据模型;所述管理装置根据第二映射关系对所述访问请求进行鉴权,所述第二映射关系为所述第一权限模型与第二权限模型之间的映射关系,所述第二权限模型与所述计算引擎适配;所述管理装置在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:所述管理装置接收所述计算引擎发送的元数据更新请求;所述管理装置响应所述元数据更新请求,根据所述第一映射关系,将所述元数据更新请求中携带的原始元数据翻译为目标元数据,所述原始元数据满足所述第二元数据模型,所述目标元数据满足所述第一元数据模型;所述管理装置根据所述第二映射关系对所述元数据更新请求进行鉴权;所述管理装置在所述元数据更新请求通过鉴权后,将所述目标元数据更新至所述管理装置。
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:所述管理装置输出配置界面;所述管理装置响应用户在所述配置界面执行的第一操作,建立所述第一映射关系,并响应用户在所述配置界面执行的第二操作,建立所述第二映射关系。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:所述管理装置响应用户在所述配置界面执行的第三操作,生成针对所述第二元数据模型以及所述第二权限模型的访问控制策略。
- 根据权利要求1至4任一项所述的方法,其特征在于,所述数据处理系统包括多种计算引擎,所述管理装置包括所述多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
- 根据权利要求1至5任一项所述的方法,其特征在于,所述目标数据的元数据为第二元数据,所述管理装置响应所述访问请求,根据第一映射关系确定所述目标数据的元数据,包括:所述管理装置根据所述访问请求读取第一元数据,所述第一元数据满足所述第一元数据模型;所述管理装置根据所述第一映射关系,将所述第一元数据翻译为满足所述第二元数据 模型的所述第二元数据;所述管理装置根据第二映射关系对所述访问请求进行鉴权,包括:所述管理装置根据所述第二映射关系,将所述访问请求中的第一权限信息翻译为满足所述第一权限模型的第二权限信息,所述第一权限信息满足所述第二权限模型;所述管理装置对所述第二权限信息进行鉴权;所述管理装置在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎,包括:所述管理装置在所述第二权限信息通过鉴权后,将所述第二元数据发送给所述计算引擎。
- 一种数据处理系统,其特征在于,所述数据处理系统包括计算引擎、管理装置、存储装置,所述管理装置内置有第一元数据模型以及第一权限模型,所述第一元数据模型以及所述第一权限模型与所述存储装置适配;所述计算引擎,用于生成针对目标数据的元数据的访问请求,并将所述访问请求发送给所述管理装置;所述管理装置,用于响应所述访问请求,根据第一映射关系,确定所述目标数据的元数据,并根据第二映射关系对所述访问请求进行鉴权;在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎,其中,所述第一映射关系为所述第一元数据模型与第二元数据模型之间的映射关系,所述第二元数据模型与所述计算引擎适配,所述目标数据的元数据满足所述第二元数据模型,所述第二映射关系为所述第一权限模型与第二权限模型之间的映射关系,所述第二权限模型与所述计算引擎适配;所述计算引擎,还用于根据所述目标数据的元数据,读取所述存储装置存储的所述目标数据。
- 根据权利要求7所述的数据处理系统,其特征在于,所述计算引擎,还用于生成元数据更新请求,并将所述元数据更新请求发送给所述管理装置;所述管理装置,还用于响应所述元数据更新请求,根据所述第一映射关系,将所述元数据更新请求中携带的原始元数据翻译为目标元数据,所述目标元数据满足所述第一元数据模型,所述原始元数据满足所述第二元数据模型,并根据所述第二映射关系对所述元数据更新请求进行鉴权,在所述元数据更新请求通过鉴权后,将所述目标元数据更新至所述管理装置。
- 根据权利要求7或8所述的数据处理系统,其特征在于,所述管理装置,还用于输出配置界面,并响应用户在所述配置界面执行的第一操作,建立所述第一映射关系,以及响应用户在所述配置界面执行的第二操作,建立所述第二映射关系。
- 根据权利要求9所述的数据处理系统,其特征在于,所述管理装置,还用于响应用户在所述配置界面执行的第三操作,生成针对所述第二元数据模型以及所述第二权限模型的访问控制策略。
- 根据权利要求7至10任一项所述的数据处理系统,其特征在于,所述数据处理系统 包括多种计算引擎,所述管理装置包括所述多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
- 根据权利要求7至11任一项所述的数据处理系统,其特征在于,所述目标数据的元数据为第二元数据;所述管理装置,具体用于根据所述访问请求读取第一元数据,所述第一元数据满足所述第一元数据模型,根据所述第一映射关系将所述第一元数据翻译为满足所述第二元数据模型的所述第二元数据,根据所述第二映射关系,将所述访问请求中的第一权限信息翻译为满足所述第一权限模型的第二权限信息,所述第一权限信息满足所述第二权限模型,并对所述第二权限信息进行鉴权;并在所述第二权限信息通过鉴权后,将所述第二元数据发送给所述计算引擎。
- 一种管理装置,其特征在于,所述管理装置应用于数据处理系统,所述数据处理系统还包括计算引擎、存储装置,所述管理装置内置有第一元数据模型以及第一权限模型;所述第一元数据模型以及所述第一权限模型与所述存储装置适配,所述管理装置包括:交互模块,用于接收所述计算引擎发送的针对目标数据的元数据的访问请求,所述目标数据存储于所述存储装置;元数据确定模块,用于响应所述访问请求,根据第一映射关系确定所述目标数据的元数据,所述第一映射关系为所述第一元数据模型与第二元数据模型之间的映射关系,所述第二元数据模型与所述计算引擎适配,所述目标数据的元数据满足所述第二元数据模型;鉴权模块,用于根据第二映射关系对所述访问请求进行鉴权,所述第二映射关系为所述第一权限模型与第二权限模型之间的映射关系,所述第二权限模型与所述计算引擎适配;所述交互模块,还用于在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎。
- 根据权利要求13所述的管理装置,其特征在于,所述交互模块还用于所述管理装置接收所述计算引擎发送的元数据更新请求;所述元数据确定模块,还用于响应所述元数据更新请求,根据所述第一映射关系,将所述元数据更新请求中携带的原始元数据翻译为目标元数据,所述原始元数据满足所述第二元数据模型,所述目标元数据满足所述第一元数据模型;所述鉴权模块,还用于根据所述第二映射关系对所述元数据更新请求进行鉴权;所述交互模块,还用于在所述元数据更新请求通过鉴权后,将所述目标元数据更新至所述管理装置。
- 根据权利要求13或14所述的管理装置,其特征在于,所述交互模块,还用于输出配置界面;所述管理装置还包括:映射模块,用于响应用户在所述配置界面执行的第一操作,建立所述第一映射关系,并响应用户在所述配置界面执行的第二操作,建立所述第二映射关系。
- 根据权利要求13至15任一项所述的管理装置,其特征在于,所述映射模块,还用于响应用户在所述配置界面执行的第三操作,生成针对所述第二元数据模型以及所述第二权限模型的访问控制策略。
- 根据权利要求13至16任一项所述的管理装置,其特征在于,所述数据处理系统包括多种计算引擎,所述管理装置包括所述多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
- 根据权利要求13至17任一项所述的管理装置,其特征在于,所述目标数据的元数据为第二元数据;所述元数据确定模块,具体用于根据所述访问请求读取第一元数据,所述第一元数据满足所述第一元数据模型,根据所述第一映射关系,将所述第一元数据翻译为满足所述第二元数据模型的所述第二元数据;所述鉴权模块,具体用于根据所述第二映射关系,将所述访问请求中的第一权限信息翻译为满足所述第一权限模型的第二权限信息,所述第一权限信息满足所述第二权限模型,并对所述第二权限信息进行鉴权;所述交互模块,具体用于在所述第二权限信息通过鉴权后,将所述第二元数据发送给所述计算引擎。
- 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;所述处理器用于执行所述存储器中存储的指令,以使得所述计算设备集群执行权利要求1至6中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行如权利要求1至6任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211446388.9 | 2022-11-18 | ||
CN202211446388.9A CN118093500A (zh) | 2022-11-18 | 2022-11-18 | 一种数据处理方法、系统、装置及相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024103714A1 true WO2024103714A1 (zh) | 2024-05-23 |
Family
ID=91083715
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/100673 WO2024103714A1 (zh) | 2022-11-18 | 2023-06-16 | 一种数据处理方法、系统、装置及相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN118093500A (zh) |
WO (1) | WO2024103714A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118410513A (zh) * | 2024-07-04 | 2024-07-30 | 北京国电通网络技术有限公司 | 面向数据库中间件的dpu内嵌式细粒度访问方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006026636A2 (en) * | 2004-08-31 | 2006-03-09 | Ascential Software Corporation | Metadata management |
CN112307122A (zh) * | 2020-10-30 | 2021-02-02 | 杭州海康威视数字技术股份有限公司 | 一种基于数据湖的数据管理系统及方法 |
CN112364110A (zh) * | 2020-11-17 | 2021-02-12 | 深圳前海微众银行股份有限公司 | 元数据管理方法、装置、设备及计算机存储介质 |
CN113468166A (zh) * | 2020-03-31 | 2021-10-01 | 广州虎牙科技有限公司 | 元数据处理方法、装置、存储介质及服务器 |
CN113761294A (zh) * | 2021-09-10 | 2021-12-07 | 北京火山引擎科技有限公司 | 数据管理方法、装置、存储介质以及电子设备 |
-
2022
- 2022-11-18 CN CN202211446388.9A patent/CN118093500A/zh active Pending
-
2023
- 2023-06-16 WO PCT/CN2023/100673 patent/WO2024103714A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006026636A2 (en) * | 2004-08-31 | 2006-03-09 | Ascential Software Corporation | Metadata management |
CN113468166A (zh) * | 2020-03-31 | 2021-10-01 | 广州虎牙科技有限公司 | 元数据处理方法、装置、存储介质及服务器 |
CN112307122A (zh) * | 2020-10-30 | 2021-02-02 | 杭州海康威视数字技术股份有限公司 | 一种基于数据湖的数据管理系统及方法 |
CN112364110A (zh) * | 2020-11-17 | 2021-02-12 | 深圳前海微众银行股份有限公司 | 元数据管理方法、装置、设备及计算机存储介质 |
CN113761294A (zh) * | 2021-09-10 | 2021-12-07 | 北京火山引擎科技有限公司 | 数据管理方法、装置、存储介质以及电子设备 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118410513A (zh) * | 2024-07-04 | 2024-07-30 | 北京国电通网络技术有限公司 | 面向数据库中间件的dpu内嵌式细粒度访问方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN118093500A (zh) | 2024-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11675746B2 (en) | Virtualized server systems and methods including domain joining techniques | |
JP7090606B2 (ja) | データベース・システムにおけるテスト・データの形成及び動作 | |
CN110799960B (zh) | 数据库租户迁移的系统和方法 | |
US11106538B2 (en) | System and method for facilitating replication in a distributed database | |
US10909102B2 (en) | Systems and methods for performing scalable Log-Structured Merge (LSM) tree compaction using sharding | |
US11287994B2 (en) | Native key-value storage enabled distributed storage system | |
CN107480237B (zh) | 面向异构桌面云平台的数据融合方法及系统 | |
WO2014146240A1 (zh) | 分布式存储系统的数据更新方法及服务器 | |
CN106202452A (zh) | 大数据平台的统一数据资源管理系统与方法 | |
CN113190529A (zh) | 一种适用MongoDB数据库的多租户数据共享存储系统 | |
US11561937B2 (en) | Multitenant application server using a union file system | |
JP5248912B2 (ja) | サーバ計算機、計算機システムおよびファイル管理方法 | |
US11934548B2 (en) | Centralized access control for cloud relational database management system resources | |
WO2024103714A1 (zh) | 一种数据处理方法、系统、装置及相关设备 | |
CN113179670A (zh) | 文档存储和管理 | |
US20170318093A1 (en) | Method and System for Focused Storage Access Notifications from a Network Storage System | |
US20190121899A1 (en) | Apparatus and method for managing integrated storage | |
US11803568B1 (en) | Replicating changes from a database to a destination and modifying replication capacity | |
US10997160B1 (en) | Streaming committed transaction updates to a data store | |
WO2023273803A1 (zh) | 一种认证方法、装置和存储系统 | |
CN114547055A (zh) | 一种数据处理方法及装置 | |
US9336232B1 (en) | Native file access | |
CN113590309B (zh) | 一种数据处理方法、装置、设备及存储介质 | |
WO2025025694A1 (zh) | 权限校验方法、装置、设备及集群 | |
US11068500B1 (en) | Remote snapshot access in a replication setup |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23890156 Country of ref document: EP Kind code of ref document: A1 |