CN113810434B - Distributed file system access method, device, host and medium - Google Patents
Distributed file system access method, device, host and mediumInfo
- Publication number
- CN113810434B CN113810434B CN202010529578.1A CN202010529578A CN113810434B CN 113810434 B CN113810434 B CN 113810434B CN 202010529578 A CN202010529578 A CN 202010529578A CN 113810434 B CN113810434 B CN 113810434B
- Authority
- CN
- China
- Prior art keywords
- file system
- parent
- access
- child
- distributed file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a distributed file system access method, apparatus, host and medium. The method comprises the steps of receiving user operation codes, starting a parent process and a child process, wherein the child process executes the user operation codes, jumps to the parent process when executing an access instruction of a distributed file system, the parent process executes the access instruction, and returns to the child process after executing the access instruction, and the parent process is an intra-platform security framework process to which the distributed file system belongs. The embodiment of the disclosure improves the security, convenience and universality of accessing the distributed file system.
Description
Technical Field
The present invention relates to the field of big data, and more particularly, to a distributed file system access method, apparatus, host, and medium.
Background
The distributed big data platform forbids reading and writing the local file system by default based on various aspects such as security, but in many business scenes, the operation file system is required to be called in the process of reading and writing files. For example, in a user personalized requirement scene, distributed parallel reading and writing is required for unstructured data, loading, editing, reading and writing are required for special format resource files, and the like, and a distributed operation file system is required to be called.
In the prior art, in order to access a distributed file system, one scheme is realized by calling a third party file system, for example MaxCompute, for distributed parallel reading and writing of unstructured data, and reading and writing of files on file systems such as Oss and the like in an external mode are supported. This solution has a high threshold and requires the application of third party storage with a high cost, and requires many supporting measures, increasing user dependence. Another temporary solution is to let the user run code that reads and writes the file system by encapsulating the sandboxes. However, this approach is not absolutely safe, and is inconvenient to use, and generally only supports a user code using MapReduce as an interface, but languages and grammars such as SQL, which are most familiar to users, are not supported, so that the method is not universal. The prior art lacks a convenient and universal method of accessing a distributed file system.
Disclosure of Invention
In view of this, the present disclosure is directed to a convenient to use, secure, universal technique for accessing a distributed file system.
To achieve this object, according to one aspect of the present disclosure, there is provided a distributed file system access method including:
receiving a user operation code;
And starting a parent process and a child process, wherein the child process executes the user operation code, jumps to the parent process when executing the access instruction to the distributed file system, executes the access instruction by the parent process, and returns to the child process after the access instruction is executed, and the parent process is an intra-platform security framework process to which the distributed file system belongs.
Optionally, the method is performed by a host in the platform, and the enabling the parent process and the child process includes assigning the parent process to a first machine of the platform other than the host, and assigning the child process to a second machine of the platform other than the host, the second machine being different from the first machine.
Optionally, the platform further comprises a working machine and a slave machine in addition to the master machine, and the first machine and the second machine are each selected from any one of the working machine and the slave machine.
Optionally, agents between the parent process and the child process are isolated.
Optionally, the distributed file system is divided into a persistent file system type and a single point file system type, the user operation code indicates a type of the distributed file system, and the parent process performs a first access operation to the persistent file system or a second access operation to the single point file system according to the indicated type.
Optionally, before the parent process and the child process are enabled, the method further comprises creating the child process by the parent process, and after the parent process and the child process are enabled, destroying the child process by the parent process.
Optionally, the receiving the user operation code includes:
providing an execution layer context, the execution layer context comprising a handle of a distributed file system, the handle pointing to a predefined function or interface;
responding to the acquisition request of the user for the handle, and returning a predefined function or interface pointed by the handle;
User operation codes written by a user by utilizing the predefined functions or interfaces are received.
Optionally, after receiving the user operation code, the method further comprises performing adaptation of the file system for which the user operation code is aimed, wherein the adaptation at least comprises mapping of the aimed file system to a file system prefix, and providing authority authentication and support of a file interface.
Optionally, the authority authentication is provided, which comprises the authority authentication according to the authority information of the file unit to be accessed by the access instruction of the distributed file system, the identity of the user, the access content of the access instruction and the access time.
Optionally, the user operation code includes a parameter setting statement code that specifies a file to be accessed in the distributed file system.
Optionally, the user operation code includes a tool class, the tool class is a program segment, and a file to be accessed in the distributed file system is specified based on an execution result of the tool class.
Optionally, the parent process has a plurality of child processes.
According to one aspect of the present disclosure, there is provided a distributed file system access apparatus comprising:
a user interface unit for receiving a user operation code;
And the parent-child process starting unit is used for starting a parent process and a child process, wherein the child process executes the user operation code, jumps to the parent process when executing the access instruction to the distributed file system, the parent process executes the access instruction and returns to the child process after executing the access instruction, and the parent process is an intra-platform security framework process to which the distributed file system belongs.
Optionally, the device is located in a host in the platform, and the parent-child process enabling unit is further configured to allocate the parent process to a first machine execution in the platform other than the host, and allocate the child process to a second machine execution in the platform other than the host, the second machine being different from the first machine.
Optionally, the platform further comprises a working machine and a slave machine in addition to the master machine, and the first machine and the second machine are each selected from any one of the working machine and the slave machine.
Optionally, agents between the parent process and the child process are isolated.
Optionally, the distributed file system is divided into a persistent file system type and a single point file system type, the user operation code indicates a type of the distributed file system, and the parent process performs a first access operation to the persistent file system or a second access operation to the single point file system according to the indicated type.
Optionally, the apparatus further comprises a parent-child process lifecycle management unit for creating a child process by the parent process before enabling the parent process and the child process, and destroying the child process by the parent process after enabling the parent process and the child process.
Optionally, the user interface unit is further configured to:
providing an execution layer context, the execution layer context comprising a handle of a distributed file system, the handle pointing to a predefined function or interface;
responding to the acquisition request of the user for the handle, and returning a predefined function or interface pointed by the handle;
User operation codes written by a user by utilizing the predefined functions or interfaces are received.
Optionally, the device further comprises a file interface unit, wherein the file interface unit is used for adapting the file system aimed by the user operation code, and the adaptation at least comprises mapping of the aimed file system to a file system prefix, and authority authentication and support of a file interface are provided.
Optionally, the authority authentication is provided, which comprises the authority authentication according to the authority information of the file unit to be accessed by the access instruction of the distributed file system, the identity of the user, the access content of the access instruction and the access time.
Optionally, the user operation code includes a parameter setting statement code that specifies a file to be accessed in the distributed file system.
Optionally, the user operation code includes a tool class, the tool class is a program segment, and a file to be accessed in the distributed file system is specified based on an execution result of the tool class.
Optionally, the parent process has a plurality of child processes.
According to one aspect of the present disclosure, there is provided a host comprising a memory for storing computer executable code, and a processor for executing the computer executable code to implement the method as described above.
According to one aspect of the present disclosure, a computer readable medium is provided, comprising computer executable code which when executed by a processor implements a method as described above.
Embodiments of the present disclosure employ parent-child processes to perform access to a distributed file system. The child process executes user operation codes, jumps to the parent process when executing the access instruction to the distributed file system, executes the access instruction by the parent process, and returns to the child process after the access instruction is executed. And the parent process is an intra-platform security framework process to which the distributed file system belongs, so that the security of accessing the distributed file system is ensured. The proposal provides that the bottom distributed file system is directly used without introducing a third party file system, thereby improving the use convenience of users, reducing extra expenditure, and having no need of examination and approval of various authorities, and being simple and efficient. In addition, the mode of the parent-child process is independent of languages and grammar, supports various languages and grammar, and improves universality.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing embodiments thereof with reference to the following drawings in which:
FIG. 1 illustrates a framework diagram of a large data platform to which a distributed file system to which embodiments of the present disclosure are applied;
FIG. 2 illustrates a flow chart of a distributed file system access method according to one embodiment of the present disclosure.
FIG. 3 illustrates a hierarchical structure diagram of a distributed file system access mechanism according to one embodiment of the present disclosure.
FIG. 4 illustrates a common file system interface hierarchy diagram of a distributed file system access mechanism showing adaptation for file interface units and a distributed file system according to one embodiment of the present disclosure.
FIG. 5 illustrates a block diagram of a distributed file system access device according to one embodiment of the present disclosure.
Fig. 6 shows a block diagram of a host according to one embodiment of the present disclosure.
Detailed Description
The present invention is described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth in detail. The present invention will be fully understood by those skilled in the art without the details described herein. Well-known methods, procedures, and flows have not been described in detail so as not to obscure the nature of the invention. The figures are not necessarily drawn to scale.
FIG. 1 illustrates a framework diagram of a large data platform to which a distributed file system to which embodiments of the present disclosure are applied.
The big data platform is a platform for calculating the larger and larger data volume generated in the modern society and storing, calculating and displaying the data. Big data technology refers to technology that quickly obtains valuable information from a wide variety of types of data, including Massively Parallel Processing (MPP) databases, data mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems. The big data platform is a platform integrating data access, data processing, data storage, query and search, analysis and mining and the like, an application interface and the like.
A distributed file system refers to a system in which the physical storage resources managed by the file system are not necessarily directly connected to a local node, but are connected to the node through a computer network. The design of the distributed file system is based on a client/server model. A typical network may include multiple servers for multiple users to access. In addition, the peer-to-peer feature allows some systems to play a dual role of client and server. For example, a user may "publish" a directory that allows other clients to access, once accessed, just as a local drive would be used for a client.
As shown in fig. 1, the big data platform includes a master 110, a working machine 120, and a slave 130. The host 110 has only one in the big data platform for issuing tasks to the working machine 120 and controlling task execution of the working machine 120. The work machine 120 has a plurality of work machines for executing tasks assigned by the host 110. The working machine 120 is provided with a slave 130 for receiving the control of the working machine 120 and completing the tasks assigned by the working machine 120.
The distributed file system is distributed among the machines shown in fig. 1, and is a computer program for managing hardware and software resources of the machines of fig. 1, and is also a kernel and a base stone of a big data platform. The software resources and the hardware resources of each machine are controlled by being distributed among each host 110, the working machine 120, and the slave 130.
A process is a running activity of a program in a computer on a certain data set, is a basic unit for resource allocation and scheduling of a system, and is a basis of an operating system structure. A process is a basic execution entity of a program, is a container of threads, and is an entity of a program. The processes are both basic allocation units and basic execution units. Child process 20 refers to a process created by another process (corresponding to parent process 10), and child process 20 inherits most of the attributes of the corresponding parent process 10, such as file descriptors. One process may subordinate to multiple child processes 20, but at most there may be only 1 parent process 10. In the disclosed embodiment, child process 20 is given access to the distributed file system by parent process 10 to afford general user operations. The parent process 10 in the embodiment of the disclosure is a frame process of the big data platform, and all business logic is developed and verified by the big data platform, so that the platform safety and the user code safety are ensured. The disclosed embodiments skillfully employ parent-child processes to move access to a distributed file system that is prone to causing unsafe hidden trouble to parent process 10 for execution, while parent process 10 may be developed under a large data platform block diagram with security. Therefore, the security of the distributed file system is guaranteed by directly utilizing the parent-child process in the mode, the distributed file system is convenient and universal, a third party system is not needed, and various additional operation and maintenance costs are avoided.
In the disclosed embodiment, the user operation code specifies all operations to be performed, including normal operations (operations that do not access the distributed file system) and operations that access the distributed file system. It is handed to the child process 20 for execution, but the child process 20 jumps to the parent process 10 when executing an access instruction to the distributed filesystem (e.g., via a remote procedure call RPC, which is a protocol that requests service from a remote computer program over a network without knowledge of underlying network technology), the access instruction is executed by the parent process 10, and returns to the child process 20 after the access instruction is executed (e.g., via the RPC).
In one embodiment, parent process 10 is assigned to a first machine (working machine or slave) of the large data platform other than the host 110 for execution, and the child process is assigned to a second machine (working machine slave) of the platform other than the host 110 for execution, the second machine being different from the first machine. In FIG. 1, the parent process is in the work machine 120 and the child process is in its slave 130, but the opposite is also possible, i.e., the parent process is in the slave 130 and the child process is in the work machine 120. Either the parent and child processes are in the two work machines 120, respectively, or in the two slaves 130, respectively.
As shown in FIG. 1, the parent process is in the work machine 120 and the child process is in its slave 130. After receiving the user operation code, the host 110 is assigned to the parent process 10 of the working machine 120, and the parent process 10 is given to the child process 20 of the slave 130 for execution without distinction. When the child process 20 executes an access instruction to the distributed file system, the parent process 10 is called through RPC or the like, the access instruction is executed by the parent process 10, and the child process 20 is returned through RPC or the like after the access instruction is executed.
The work machine 120 has a call server 121, and the slave 130 has a call client 131. The main function of the call server 121 and the call client 131 is to communicate (e.g., RPC call) to implement the call between parent and child processes (including the parent process 10 calling the child process 20, and the child process 20 calling the parent process 10 described above).
Calls between the parent process 10 and the child process 20 are classified into synchronous calls and asynchronous calls.
When synchronous call is made, the call server 121 transmits a handshake message to the call client 131, and the call client 131 responds to the handshake message in real time. After the call server 121 receives the real-time response, the two parties establish a handshake. Parent process 10 begins to call child process 20 or child process 20 calls parent process 10. That is, the caller transmits the program that needs to be executed by the callee to the callee.
When the call is made out of step, a pipe is opened between the call server 121 and the call client 131. Call client 131 listens to the pipe through the read port. The call server 121 writes a response request to the pipe, which the call client 131 does not answer in real time, the response request remaining in the pipe. When the calling client 131 reads the last word of the response request, a reply is returned. Then, the parent process 10 starts to call the child process 20, or the child process 20 calls the parent process 10. That is, the caller transmits the program that needs to be executed by the callee to the callee. The embodiment of the disclosure allows the calling in an asynchronous mode, so that the flexibility of calling operation is improved.
As shown in FIG. 1, when executing a task, a master 110 drops the task's plan (via user operation code) to a parent process 10 of a work machine 120, which parent process 10 drops to its subordinate child processes 20 of a slave 130. Child process 20 returns the progress of execution to parent process 10 and parent process 10 returns the progress of execution to host 110. When a host 110 wants to access data, the data access request is issued (by user operation code) to the parent process 10 of the work machine 120, which parent process 10 issues to its subordinate child process 20 of the slave 130. If not an access to the distributed file system, child process 20 performs the access, returns the modification result to parent process 10 of work machine 120 if a modification is involved, and the modification result is returned by parent process 10 to host 110. If the access to the distributed file system is made, the sub-process 20 returns to the parent process 10 (through the call between the call server 121 and the call client 131) and returns to the sub-process 20 (through the call between the call server 121 and the call client 131) after the execution is completed.
In the disclosed embodiment, the parent process 10 and the child process 20 are proxy isolated, i.e., the proxy between the parent process 10 and the child process 20 is isolated. In conventional parent-child process accesses, parent process 10 accesses child process 20 through an agent, which undoubtedly increases the likelihood of compromise. The embodiment of the disclosure changes the traditional proxy access mode of the parent process 10 and the child process 20 into direct access, isolates the proxy, and greatly reduces the unsafe hidden trouble caused by proxy access.
As shown in fig. 2, a distributed file system access method is provided, which is performed by the host 110. The method comprises the following steps:
Step 210, receiving a user operation code;
And 220, starting a parent process and a child process, wherein the child process executes the user operation code, jumps to the parent process when executing the access instruction of the distributed file system, executes the access instruction by the parent process, and returns to the child process after executing the access instruction, and the parent process is an intra-platform security framework process to which the distributed file system belongs.
The above steps are described in detail below.
In step 210, a user operation code is received.
User operation code refers to code written by a user for performing operations on a large data platform, including possible access code to a distributed file system, as well as code for other operations. For code for other operations, which is executed by the child process 20, and for access code to the distributed file system, which is executed by the child process 20 jumping to the parent process 10.
In one embodiment, step 210 includes:
providing an execution layer context, the execution layer context comprising a handle of a distributed file system, the handle pointing to a predefined function or interface;
responding to the acquisition request of the user for the handle, and returning a predefined function or interface pointed by the handle;
User operation codes written by a user by utilizing the predefined functions or interfaces are received.
An execution layer context is an environment in which program code executes. Some information or the like used in executing the program code may be put into the context, facilitating execution of the program code.
The user operation code is user written. Generally, a user interface (i.e., user application layer, performed by the user interface unit 310 of fig. 3) is opened to the user, where programming is performed by the user. The disclosed embodiments do not mechanically key written code into the user interface by the user, but rather employ a system that may invoke some predefined function or interface, thereby simplifying operation. For example, the generic syntax of the distributed SQL language, such as custom functions Udf, custom aggregate functions Udaf, custom table functions Udtf, generic Mapreduce usage, custom interfaces for Graph computation, etc., does not need to be developed separately, and handles to these predefined good functions or interfaces are placed in the execution layer context.
Handle refers to the use of a unique integer value, i.e., a 4 byte (8 bytes in a 64 bit program) long value, to identify different objects in an application and different instances in the same class, such as a window, button, icon, scroll bar, output device, control, file, or the like. In the disclosed embodiment, it identifies predefined functions or interfaces, such as custom functions Udf, custom aggregate functions Udaf, custom table functions Udtf, general Mapreduce usage, custom interfaces for Graph computation, and the like. An application can access information of a corresponding object through a handle, but the handle is not a pointer, and the program cannot directly read information in a file using the handle.
Thus, the user is provided with an execution layer context that includes a handle to a predefined function or interface. When the user writes the user operation code, if a predefined function or interface is used, the interface returns the predefined function or interface pointed by the handle to the user, and the user can write the user operation code on the interface by using the predefined function or interface.
The benefit of this embodiment is that the use of the user application layer is unified so that the generic syntax of distributed SQL, such as custom function Udf, custom aggregate function Udaf, etc., can be used without separate development, unifying the retrieval of the file system handle from the execution layer context. The user can use the original programming mode and the general file I/O interface without new cost to learn, thereby reducing the learning cost of the user and improving the universality.
In addition, the embodiment of the disclosure also provides a simplified and unified parameter setting mode, and names, labels and the like of input/output files are specified by inputting simple sentences on an interface. That is, the user operation code includes a parameter setting statement code that specifies a file to be accessed in the distributed file system.
In one embodiment, the file to be accessed is specified by a parameter set statement code in the form of set k=v. K is an identifier specifying the file to be accessed and V is the location or tag where the file to be accessed is stored. The meaning of the label is that some storage locations are marked in order to simplify the operation of designating some storage locations, so that only the label is used for designating files stored in the locations later. Tags are particularly useful when specifying files on multiple storage locations simultaneously. Multiple storage locations may be labeled simultaneously so that the label indicates simultaneous access to files for all of the storage locations. The method can support various algorithms such as SQL, map Reduce and Graph, and secondary assignment and development are not needed. An exemplary parameter setting statement is as follows:
set odps.sql.volume.input[/output].desc=[<project>.]<table>.<partition>[:<label>];
Wherein the parameters in brackets are optional. odps.sql.volume.input [/output ]. Desc denotes a file to be accessed, where input or output denotes inputting or outputting the file, project denotes a group where the file to be accessed is located, table denotes a table where the file to be accessed is located, label denotes a partition where the file to be accessed is located, and a group includes tables including partitions. label represents the tag of the file to be accessed.
In one embodiment, the files to be accessed in the distributed file system are not specified by way of the parameter set statements described above, but rather by way of a tool class. The user operation code includes a tool class. A tool class is a program segment on the basis of which a file to be accessed in the distributed file system can be specified.
Tools have some flexibility in this way as follows. This way is a variant of the way in which the parameter is set up by the statement, and the tool class will translate the parameters set by the user in an objectified way (program fragment) into set parameters. The two modes are only different in perception to users, are unified to the platform and submit identification through the setting mode. The following are illustrative examples of tool classes:
InputUtils.addVolume(new VolumeInfo([project,]inVolume,inPartition,"inLabel"),new JobConf());
OutputUtils.addVolume(new VolumeInfo([project,]outVolume,outPartition,"outLabel"),new JobConf());
the scheme of the embodiment of the disclosure unifies the use of the user application layer, and a user can use the original programming mode and the universal file I/O interface without having new cost to learn.
In addition, in one embodiment, after step 210, the method further comprises performing an adaptation of the file system for which the user operation code is directed, the adaptation comprising at least a mapping of the directed file system to a file system prefix, providing permission authentication, support for a file interface. This operation is performed by the file interface unit 320 as shown in fig. 3-4.
The user interface unit 310 of the user application layer is a user oriented interface unit of the big data platform, which is responsible for the input of user operation codes, while the file interface unit 320 is a file oriented interface unit, which is responsible for the adaptation of the user operation codes to the file system, i.e. which files the user operation codes are to access. The method can be realized based on a current general file system interface Hadoopfilesystem system, but the general file system interface is expanded by the embodiment of the disclosure, and mainly aims at mapping from a file system to a file system prefix (prefix), providing authority authentication, supporting the file interface, including supporting symlink, snapshot, acl, xattr related interfaces, supporting listCorruptFileBlocks interfaces, supporting seekToNewSource interfaces and realizing concat, truncate interfaces.
Mapping of file systems to file system prefixes (prefixes) may quickly help a user lock a storage area corresponding to a file system to be accessed by the user, where file systems of the same file system prefix are stored in adjacent areas.
Rights authentication may set rights information separately to a file unit of the distributed file system. The file unit is a minimum unit in which the same authority information is set. For example, in a persistent (volumn) file system, the same system file is set to multiple backups on multiple working machines or slaves, and the authority information of the multiple backups should be the same, so the multiple backups are combined as one file unit. The rights information is information specifying rights to access the file unit. The rights include user identity rights, accessible file content rights, access time rights, and the like. For example, if it is specified that the user a can access only file units of type X from monday to friday, it is only determined that the user a is not accessing enough, and the access file content and access time are also combined. Therefore, the judgment of the authority is a multifaceted judgment. The authority authentication is provided according to the authority information of the file unit to be accessed by the access instruction of the distributed file system, the identity of the user, the access content of the access instruction and the access time. And if the identity of the user, the access content of the access instruction and the access time accord with the authority information, the authentication is successful. In this way, the security of access is improved.
The support of various file interfaces can expand the range of files which can be accessed by a user and improve the user experience.
As shown in fig. 4, a common file system interface 321 is further disposed between the distributed file system 330 and the file interface unit 320, which is used for performing semantic adaptation between the distributed file system 330 and the file interface unit 320, so as to implement semantic compatibility and provide support for open source ecology. Because of the expansion of the file interface unit 320, such as the above-mentioned mapping of the file system to the file system prefix, providing authority authentication, supporting the file interface, etc., mainly by the open source form, the distributed file system 330 is generally in the cloud, and needs the public file system interface 321 to perform semantic adaptation. The common file system interface 321 is a java interface that accesses the distributed file system 330, based on which open source HDFS compatibility is achieved. The common file system interface 321 has two sets of access paths to the distributed file system 330, one set in the manner of a single process 15 according to the prior art and one set in the manner of the child process 20 and parent process 10 invoking each other according to an embodiment of the present disclosure.
In step 220, parent and child processes are enabled.
User operation code received by step 210 is executed by sub-process 20. When an access instruction to the distributed file system is executed, the parent process 10 is skipped, the access instruction is executed by the parent process 10, and the child process 20 is returned after the access instruction is executed. The above process is described in detail in the description of the framework in connection with fig. 1, and is not repeated.
In one embodiment, as shown in FIG. 3, the distributed file system is divided into a persistent (Volume) file system 331 type and a single point (TempFile) file system 332 type, which support operations on persistent files and single point files, respectively. Persistent files store multiple backups on different work machines 120, slaves 130 on a large data platform for persistent storage requirements. The single point file is used for temporary reading and writing, single node storage (stored on a working machine 120 or a slave machine 130), the current working life cycle is valid, and the execution end of the working is automatically cleared. Different file systems may be selected for use for different demand scenarios.
The user operation code received in step 210 indicates the type of distributed file system, and the parent process 10 performs a first access operation to the persistent file system 331 or a second access operation to the single point file system 332 according to the indicated type. That is, if the type is a persistent file system 331 type, parent process 10 performs a persistent file system access 101, and if the type is a single point file system 332, parent process 10 performs a single point file system access 102.
In addition, in one embodiment, a child process is created by a parent process before the parent process and child process are enabled. After the parent process and the child process are enabled, the child process is destroyed by the parent process. The child process life cycle management of the parent-child process structure is perfected by prescribing perfect modes of creating child processes, monitoring the child processes and destroying the child processes by the parent processes. In addition, a parent process may create a plurality of child processes, and execute the steps 210-220, respectively, and destroy the child processes after the respective data accesses are completed, thereby realizing efficient management of the lifecycles of the plurality of child processes.
In the embodiment of the disclosure, the model of the parent-child process can bring the advantages that the framework of the distributed file system is separated from the user program, and the system structure is clearer. The parent process solves some of the problems inherent to the distributed file system and the significant problems affecting security specifically, while the child process solves some of the problems that do not affect the security of the distributed file system. The parent-child processes can interact in an RPC calling mode, so that cross-language can be realized more easily. User execution logic is executed in the subprocess, so that safety control is facilitated. The upper layer can be abstracted into a more general framework, common services can be extracted to be executed by a parent process, and a child process focuses on specific execution. In addition, parent-child processes support a one-to-many mode, i.e., one parent process may correspond to multiple child processes. This is necessary in certain scenarios, such as where certain scripting languages (e.g., python, shell) do not support multithreading, and only multiprocessing is used if concurrency is to be achieved.
The embodiment of the disclosure directly uses the bottom distributed file system without introducing a third party file system, thereby improving the convenience of use for users and reducing additional expenditure. Compared with the mode of realizing access to the distributed file system by calling the third-party file system in the prior art, the threshold is lower, the third-party storage is not required to be applied, secondary development such as segmentation, data serialization and reverse serialization on files and the like is not required to be performed on the characteristics of the third-party file system in distributed computation, a frame mode and a grammar interface similar to the appearance are not required to be provided for supporting and identifying input and output, configuration parameters and the like of a user, and the dependence of the user is not increased. In addition, the embodiment of the disclosure supports the access of Java interfaces to read and write files stored in the bottom distributed storage in the custom program, the custom program for calling the file system is supported to run in an isolated environment, the call for operating the file system is accessed through a father-son process proxy, various complex authorities do not need to be applied for approval, and the file system can be used by setting parameters in the internal environment of a group, and is simple, easy to use, safe and controllable. The embodiment of the disclosure unifies the user interfaces of the upper user application layer, is compatible with the existing grammar SQL, mapReduce and the like of the multiplexing big data platform, and can be called by unifying the acquisition of file system handles from the upper and lower execution layers without separate development of the functions or interfaces and the like. The user only needs to use the original programming mode and the universal file I/O interface, so that the learning cost of the user is reduced.
As shown in fig. 5, according to one embodiment of the present disclosure, there is provided a distributed file system access apparatus 300 including:
A user interface unit 310 for receiving a user operation code;
And a parent-child process enabling unit 320, configured to enable a parent process and a child process, where the child process executes the user operation code, jumps to the parent process when executing an access instruction to the distributed file system, the parent process executes the access instruction, and returns to the child process after executing the access instruction, and the parent process is an intra-platform security framework process to which the distributed file system belongs.
Optionally, the apparatus 300 is located in a host in the platform, and the parent-child process enabling unit 320 is further configured to allocate the parent process to a first machine execution in the platform other than the host, and allocate the child process to a second machine execution in the platform other than the host, the second machine being different from the first machine.
Optionally, the platform further comprises a working machine and a slave machine in addition to the master machine, and the first machine and the second machine are each selected from any one of the working machine and the slave machine.
Optionally, agents between the parent process and the child process are isolated.
Optionally, the distributed file system is divided into a persistent file system type and a single point file system type, the user operation code indicates a type of the distributed file system, and the parent process performs a first access operation to the persistent file system or a second access operation to the single point file system according to the indicated type.
Optionally, the apparatus further comprises a parent-child process lifecycle management unit (not shown) for creating a child process by the parent process before enabling the parent process and the child process, and destroying the child process by the parent process after enabling the parent process and the child process.
Optionally, the user interface unit 310 is further configured to:
providing an execution layer context, the execution layer context comprising a handle of a distributed file system, the handle pointing to a predefined function or interface;
responding to the acquisition request of the user for the handle, and returning a predefined function or interface pointed by the handle;
User operation codes written by a user by utilizing the predefined functions or interfaces are received.
Optionally, the device 300 further comprises a file interface unit 320, configured to perform adaptation of a file system targeted by the user operation code, where the adaptation includes at least mapping of the targeted file system to a file system prefix, and provides authority authentication and support for a file interface.
Optionally, the authority authentication is provided, which comprises the authority authentication according to the authority information of the file unit to be accessed by the access instruction of the distributed file system, the identity of the user, the access content of the access instruction and the access time.
Optionally, the user operation code includes a parameter setting statement code that specifies a file to be accessed in the distributed file system.
Optionally, the user operation code includes a tool class, the tool class is a program segment, and a file to be accessed in the distributed file system is specified based on an execution result of the tool class.
Optionally, the parent process has a plurality of child processes.
Since the distributed file system access method of the present disclosure has been described in detail above in connection with fig. 2, implementation details of the distributed file system access apparatus 300 are substantially identical to those of the distributed file system access method, and thus are not described in detail.
The picture automatic generation method according to one embodiment of the present disclosure may be implemented by the host 110 of fig. 6. The host 110 according to an embodiment of the present disclosure is described below with reference to fig. 6. The host 110 shown in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 6, the host 110 is in the form of a general purpose computing device. The components of host 110 may include, but are not limited to, at least one processing unit 810 described above, at least one memory unit 820 described above, and a bus 830 that connects the various system components, including memory unit 820 and processing unit 810.
Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs the steps of the various exemplary embodiments of the invention described in the description of the exemplary methods above in this specification. For example, the processing unit 810 may perform the various steps as shown in fig. 2.
The storage unit 820 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 8201 and/or cache memory 8202, and may further include Read Only Memory (ROM) 8203.
Storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 830 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
Host 110 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with host 110, and/or any device (e.g., router, modem, etc.) that enables host 110 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 850. Also, host 110 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network such as the Internet via network adapter 860. As shown, network adapter 860 communicates with other modules of host 110 over bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with host 110 including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It will be appreciated that the above description is only of a preferred embodiment of the invention and is not intended to limit the invention, and that many variations of the embodiments of the present description exist to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
It should be understood that each embodiment in this specification is described in an incremental manner, and the same or similar parts between each embodiment are all referred to each other, and each embodiment focuses on differences from other embodiments.
It should be understood that the foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
It should be understood that elements described herein in the singular or shown in the drawings are not intended to limit the number of elements to one. Furthermore, modules or elements described or illustrated herein as separate may be combined into a single module or element, and modules or elements described or illustrated herein as a single may be split into multiple modules or elements.
It is also to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. The use of these terms and expressions is not meant to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications are possible and are intended to be included within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.
Claims (22)
1. A distributed file system access method, comprising:
the method comprises the steps of receiving user operation codes, carrying out adaptation of a file system aimed by the user operation codes, wherein the adaptation at least comprises mapping of the aimed file system to a file system prefix, providing authority authentication and supporting a file interface, and the providing authority authentication comprises the following steps:
Performing authority authentication according to the authority information of the file unit to be accessed by the access instruction of the distributed file system, the identity of the user, the access content of the access instruction and the access time;
And starting a parent process and a child process, wherein the child process executes the user operation code, jumps to the parent process when executing the access instruction to the distributed file system, executes the access instruction by the parent process, and returns to the child process after the access instruction is executed, and the parent process is an intra-platform security framework process to which the distributed file system belongs.
2. The method of claim 1, wherein the method is performed by a host in the platform, the enabling parent and child processes comprising:
The parent process is assigned to a first machine execution in the platform other than the host, and the child process is assigned to a second machine execution in the platform other than the host, the second machine being different from the first machine.
3. The method of claim 2, wherein the platform includes a work machine and a slave machine in addition to the master machine, the first machine and the second machine each being selected from any one of the work machine and the slave machine.
4. The method of claim 1, wherein an agent between the parent process and the child process is isolated.
5. The method of claim 1, wherein the distributed file system is divided into a persistent file system type and a single point file system type, the user operation code indicates a type of the distributed file system, and the parent process performs a first access operation to the persistent file system or a second access operation to the single point file system according to the indicated type.
6. The method of claim 1, wherein prior to enabling the parent process and the child process, the method further comprises creating the child process by the parent process;
after the parent process and the child process are enabled, the method further includes destroying the child process by the parent process.
7. The method of claim 1, wherein the receiving user operation code comprises:
providing an execution layer context, the execution layer context comprising a handle of a distributed file system, the handle pointing to a predefined function or interface;
responding to the acquisition request of the user for the handle, and returning a predefined function or interface pointed by the handle;
User operation codes written by a user by utilizing the predefined functions or interfaces are received.
8. The method of claim 1, wherein the user operation code comprises a parameter set statement code that specifies a file to be accessed in the distributed file system.
9. The method of claim 1, wherein the user operation code comprises a tool class, the tool class being a program fragment, specifying a file to access in the distributed file system based on a result of execution of the tool class.
10. The method of claim 1, wherein the parent process has a plurality of child processes.
11. A distributed file system access apparatus comprising:
The file interface unit is used for adapting a file system aimed at by the user operation code, and at least comprises mapping from the aimed file system to a file system prefix, providing authority authentication and supporting a file interface, wherein the authority authentication providing comprises the following steps:
Performing authority authentication according to the authority information of the file unit to be accessed by the access instruction of the distributed file system, the identity of the user, the access content of the access instruction and the access time;
And the parent-child process starting unit is used for starting a parent process and a child process, wherein the child process executes the user operation code, jumps to the parent process when executing the access instruction to the distributed file system, the parent process executes the access instruction and returns to the child process after executing the access instruction, and the parent process is an intra-platform security framework process to which the distributed file system belongs.
12. The apparatus of claim 11, wherein the apparatus is located in a host in the platform, the parent-child process enabling unit further to:
The parent process is assigned to a first machine execution in the platform other than the host, and the child process is assigned to a second machine execution in the platform other than the host, the second machine being different from the first machine.
13. The apparatus of claim 12, wherein the platform includes a work machine and a slave machine in addition to the master machine, the first machine and the second machine each being selected from any one of the work machine and the slave machine.
14. The apparatus of claim 11, wherein an agent between the parent process and the child process is isolated.
15. The apparatus of claim 11, wherein the distributed file system is divided into a persistent file system type and a single point file system type, the user operation code indicates a type of the distributed file system, and the parent process performs a first access operation to the persistent file system or a second access operation to the single point file system according to the indicated type.
16. The apparatus of claim 11, wherein the apparatus further comprises a parent-child process lifecycle management unit to create a child process by a parent process before enabling the parent process and the child process, and to destroy the child process by the parent process after enabling the parent process and the child process.
17. The apparatus of claim 11, wherein the user interface unit is further to:
providing an execution layer context, the execution layer context comprising a handle of a distributed file system, the handle pointing to a predefined function or interface;
responding to the acquisition request of the user for the handle, and returning a predefined function or interface pointed by the handle;
User operation codes written by a user by utilizing the predefined functions or interfaces are received.
18. The apparatus of claim 11, wherein the user operation code comprises a parameter set statement code that specifies a file to access in the distributed file system.
19. The apparatus of claim 11, wherein the user operation code comprises a tool class, the tool class being a program fragment, specifying a file to access in the distributed file system based on a result of execution of the tool class.
20. The apparatus of claim 11, wherein the parent process has a plurality of child processes.
21. A host, comprising:
a memory for storing computer executable code;
a processor for executing the computer executable code to implement the method of any one of claims 1-11.
22. A computer readable medium comprising computer executable code which when executed by a processor performs the method of any one of claims 1-11.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010529578.1A CN113810434B (en) | 2020-06-11 | 2020-06-11 | Distributed file system access method, device, host and medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010529578.1A CN113810434B (en) | 2020-06-11 | 2020-06-11 | Distributed file system access method, device, host and medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113810434A CN113810434A (en) | 2021-12-17 |
| CN113810434B true CN113810434B (en) | 2025-07-22 |
Family
ID=78943903
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010529578.1A Active CN113810434B (en) | 2020-06-11 | 2020-06-11 | Distributed file system access method, device, host and medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113810434B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114296962A (en) * | 2021-12-27 | 2022-04-08 | 阿里巴巴新加坡控股有限公司 | Data processing method and device, electronic equipment and computer storage medium |
| CN118803021A (en) * | 2023-04-12 | 2024-10-18 | 华为云计算技术有限公司 | A service management method and device |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108289080A (en) * | 2017-01-09 | 2018-07-17 | 阿里巴巴集团控股有限公司 | A kind of methods, devices and systems accessing file system |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7143288B2 (en) * | 2002-10-16 | 2006-11-28 | Vormetric, Inc. | Secure file system server architecture and methods |
| CA2773342A1 (en) * | 2012-03-30 | 2013-09-30 | Disternet Technology, Inc. | System and method for managing streaming services |
| CN102790761B (en) * | 2012-06-13 | 2015-05-06 | 浙江浙大中控信息技术有限公司 | Regional medical treatment information system and access authority control method |
| JP6529304B2 (en) * | 2015-03-25 | 2019-06-12 | 株式会社日立ソリューションズ | Access control system and access control method |
-
2020
- 2020-06-11 CN CN202010529578.1A patent/CN113810434B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108289080A (en) * | 2017-01-09 | 2018-07-17 | 阿里巴巴集团控股有限公司 | A kind of methods, devices and systems accessing file system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113810434A (en) | 2021-12-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2724256B2 (en) | Computer system | |
| EP2344953B1 (en) | Provisioning virtual resources using name resolution | |
| US9459850B2 (en) | Adaptive cloud aware just-in-time (JIT) compilation | |
| EP3014479B1 (en) | Omega names: name generation and derivation | |
| US5724503A (en) | Method and apparatus for interpreting exceptions in a distributed object system | |
| US6976262B1 (en) | Web-based enterprise management with multiple repository capability | |
| US7979450B2 (en) | Instance management of code in a database | |
| CN100592256C (en) | System and method for seamlessly comparing objects | |
| CN101329636A (en) | Method and apparatus for moving processes between isolation environments | |
| US20080104250A1 (en) | Identity migration system apparatus and method | |
| CN111078205A (en) | A modular programming method, device, storage medium and electronic device | |
| CN110781505B (en) | System construction method and device, retrieval method and device, medium and equipment | |
| CN116737113B (en) | Metadata catalog management system and method for mass scientific data | |
| CN115168061A (en) | Calculation storage separation method and system, electronic equipment and storage medium | |
| CN113810434B (en) | Distributed file system access method, device, host and medium | |
| Grimshaw et al. | Architectural support for extensibility and autonomy in wide-area distributed object systems | |
| US6829761B1 (en) | Method and apparatus for managing shared memory in a run-time environment | |
| Rana et al. | Resource discovery for dynamic clusters in computational grids | |
| US20050114834A1 (en) | Grid-enabled ANT compatible with both stand-alone and grid-based computing systems | |
| CN112015374B (en) | Cross-programming-language micro-service integration system based on natural language | |
| US7546313B1 (en) | Method and framework for using XML files to modify network resource configurations | |
| US20110238702A1 (en) | Creating multiple mbeans from a factory mbean | |
| CN112650713A (en) | File system operation method, device, equipment and storage medium | |
| CN108809715A (en) | A kind of method and device of deployment management platform | |
| CN110134662A (en) | SDN distributed storage system, data processing method and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |