CN107861686A

CN107861686A - File memory method, service end and computer-readable recording medium

Info

Publication number: CN107861686A
Application number: CN201710885384.3A
Authority: CN
Inventors: 卢道和; 陈晓峰; 杨军; 钱碧伟; 黎君; 胡思文
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2017-09-26
Filing date: 2017-09-26
Publication date: 2018-03-30
Anticipated expiration: 2037-09-26
Also published as: CN107861686B

Abstract

The invention discloses a file storage method, which is applied to an Internet data center. The Internet data center includes a file processing system, a distributed file system, and a distributed storage system. The file processing system includes a server and a client. The method includes: The server receives the files uploaded by the sender through the client; caches the received files in a temporary folder, and records the storage location information of each file in the distributed storage system; merges the files in the temporary folder , to obtain the merged file, and store the merged file in the distributed file system; based on the merged file, update the corresponding storage location information in the distributed storage system. The invention also discloses a server and a computer-readable storage medium. The invention can increase the amount of data storage by merging files, and store files through a distributed file system, so that more files can be stored.

Description

File storage method, server and computer-readable storage medium

技术领域technical field

本发明涉及应用技术领域，尤其涉及一种文件存储方法、服务端和计算机可读存储介质。The present invention relates to the field of application technology, in particular to a file storage method, a server and a computer-readable storage medium.

背景技术Background technique

传统的海量文件存储是采用专门的文件服务器中如NAS(Network AttachedStorage，网络附属存储)进行存储，NAS被定义为一种特殊的专用数据存储服务器，包括存储器件(例如磁盘阵列、CD/DVD驱动器、磁带驱动器或可移动的存储介质)和内嵌系统软件，可提供跨平台文件共享功能。NAS通常在一个LAN(Local Area Network，局域网)上占有自己的节点，无需应用服务器的干预，允许用户在网络上存取数据。Traditional mass file storage is stored in a dedicated file server such as NAS (Network Attached Storage, Network Attached Storage). NAS is defined as a special dedicated data storage server, including storage devices (such as disk arrays, CD/DVD drives, etc.) , tape drive or removable storage media) and embedded system software that provides cross-platform file sharing. NAS usually occupies its own node on a LAN (Local Area Network, local area network), without the intervention of the application server, allowing users to access data on the network.

现有的文件存储架构，多台前端服务器通过专用存储网络共享后端NAS设备，后端NAS设备上的存储空间通过CIFS(Commom Internet File System，通用网络文件系统)、NFS(Network File System,网络文件系统)协议共享给前端主机，可同时对同一目录或文件进行并发读写。文件系统位于后端存储系统，连接采用标准以太网链路和TCP(TransmissionControl Protocol，传输控制协议)/IP(Internet Protocol，因特网互联协议)协议，可实现多系统之间的文件存储共享。然而，随着时间的推移以及业务的发展，数据规模越来越大，NAS设备的存储容量有限，传统文件存储模式已经很难应对数据的井喷式发展，也就是说，在数据量越来越大的情况下，现有的文件存储方式，存储数据的容量很难满足需求。In the existing file storage architecture, multiple front-end servers share the back-end NAS device through a dedicated storage network, and the storage space on the back-end NAS device passes through CIFS (Commom Internet File System, common network file system), NFS (Network File System, network File system) protocol is shared with the front-end host, and concurrent reading and writing can be performed on the same directory or file at the same time. The file system is located in the back-end storage system, and the connection adopts standard Ethernet link and TCP (Transmission Control Protocol, Transmission Control Protocol)/IP (Internet Protocol, Internet Protocol) protocol, which can realize file storage sharing between multiple systems. However, with the passage of time and the development of business, the scale of data is getting larger and larger, and the storage capacity of NAS devices is limited. It is difficult for the traditional file storage mode to cope with the blowout development of data. In the case of a large file, the existing file storage method and the capacity of the stored data are difficult to meet the demand.

发明内容Contents of the invention

本发明的主要目的在于提供一种文件存储方法、服务端和计算机可读存储介质，旨在解决现有的文件存储方式，在数据容量增大的情况下，难以满足存储需求的技术问题。The main purpose of the present invention is to provide a file storage method, a server and a computer-readable storage medium, aiming to solve the technical problem that the existing file storage method is difficult to meet the storage requirements when the data capacity increases.

为实现上述目的，本发明提供一种文件存储方法，应用于互联网数据中心，所述互联网数据中心包括文件处理系统、分布式文件系统、分布式存储系统，所述文件处理系统包括服务端和客户端，所述文件存储方法包括：In order to achieve the above object, the present invention provides a file storage method, which is applied to an Internet data center. The Internet data center includes a file processing system, a distributed file system, and a distributed storage system. The file processing system includes a server and a client terminal, the file storage method includes:

文件处理系统的服务端通过客户端接收发送方上传的文件；The server of the file processing system receives the file uploaded by the sender through the client;

将接收到的文件缓存至临时文件夹中，并在分布式存储系统中记录各个文件的存储位置信息；Cache the received files into a temporary folder, and record the storage location information of each file in the distributed storage system;

对临时文件夹中的各个文件进行合并处理，得到合并后的文件，并将合并后的文件存储到分布式文件系统中；Merge each file in the temporary folder to obtain the merged file, and store the merged file in the distributed file system;

基于合并后的文件，更新分布式存储系统中对应的存储位置信息。Based on the merged files, the corresponding storage location information in the distributed storage system is updated.

可选地，所述对临时文件夹中的各个文件进行合并处理，得到合并后的文件的步骤包括：Optionally, the step of merging each file in the temporary folder to obtain the merged file includes:

所述服务端扫描所述临时文件夹中的各个文件；The server scans each file in the temporary folder;

获取组合文件，并在扫描的文件中确定与所述组合文件合并后的容量值小于预设阀值的文件，将确定的文件合并到所述组合文件中。Acquiring the combined file, and determining among the scanned files a file whose capacity value after being combined with the combined file is smaller than a preset threshold value, and merging the determined file into the combined file.

可选地，所述方法还包括：Optionally, the method also includes:

在接收到文件查询指令时，确定文件查询指令对应的索引信息；When receiving the file query instruction, determine the index information corresponding to the file query instruction;

在分布式文件系统中查找所述索引信息所指向的已合并文件；Find the merged file pointed to by the index information in the distributed file system;

对所述已合并文件进行还原，以从已合并文件中还原出所述索引信息对应的文件。Restore the merged file, so as to restore the file corresponding to the index information from the merged file.

可选地，所述将合并后的文件存储到分布式文件系统中的步骤之后，所述方法还包括：Optionally, after the step of storing the merged file in the distributed file system, the method further includes:

所述服务端基于分布式文件系统中存储的文件，生成文件标识信息及文件哈希信息；The server generates file identification information and file hash information based on the files stored in the distributed file system;

通过所述客户端反馈文件标识信息及文件哈希信息至所述发送方，以供所述发送方将文件标识信息及文件哈希信息传输至接收方；feeding back the file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and file hash information to the receiver;

通过所述客户端接收到所述接收方发送的文件标识信息时，在分布式文件系统中提取所述文件标识信息对应的文件，并反馈至所述接收方，以供所述接收方通过文件哈希信息检验所述文件，并在检验成功时获取所述文件。When the client receives the file identification information sent by the recipient, extract the file corresponding to the file identification information in the distributed file system, and feed it back to the recipient, so that the recipient can pass the file The hash information checks the file, and when the check is successful, the file is retrieved.

可选地，所述服务端的个数包括多个，所述文件处理系统的服务端和客户端通过网关连接，文件从客户端上传至服务端的方式包括：网关按照预设的策略，将客户端上传的文件轮询上传至服务端中。Optionally, the number of the server includes multiple, the server and the client of the file processing system are connected through a gateway, and the way of uploading the file from the client to the server includes: the gateway sends the client to the server according to a preset strategy. The uploaded files are polled and uploaded to the server.

可选地，所述基于合并后的文件，更新分布式存储系统中对应的存储位置信息的步骤之后，所述方法还包括：Optionally, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the method further includes:

所述服务端扫描分布式文件系统中的各个文件，以监测各个文件的存储时长；The server scans each file in the distributed file system to monitor the storage duration of each file;

在有文件的存储时长达到预设时长时，删除所述分布式文件系统中的所述文件，并删除所述分布式存储系统中所述文件的存储位置信息。When the storage duration of a file reaches a preset duration, delete the file in the distributed file system, and delete the storage location information of the file in the distributed storage system.

可选地，所述互联网数据中心还包括分布式应用程序协调服务，所述服务端扫描分布式文件系统中的各个文件，以监测各个文件的存储时长的步骤之前，所述方法还包括：Optionally, the Internet data center also includes a distributed application program coordination service, and the server scans each file in the distributed file system to monitor the storage duration of each file. Before the step, the method further includes:

所述服务端向分布式应用程序协调服务发送删除锁的请求信息；The server sends request information for deleting locks to the distributed application coordination service;

在获取锁成功时，执行扫描分布式文件系统中的各个文件，以监测各个文件的存储时长的步骤。When the lock is acquired successfully, the step of scanning each file in the distributed file system to monitor the storage duration of each file is executed.

可选地，所述服务端位于主互联网数据中心中，在系统中存在备互联网数据中心的情况下，所述基于合并后的文件，更新分布式存储系统中对应的存储位置信息的步骤之后，所述方法包括：Optionally, the server is located in the primary Internet data center, and if there is a backup Internet data center in the system, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, The methods include:

所述服务端将存储的文件同步到备互联网数据中心所在文件处理系统的服务端中，以供备互联网数据中心所在文件处理系统的服务端执行文件存储操作。The server synchronizes the stored files to the server of the file processing system where the standby Internet data center is located, so that the server of the file processing system where the standby Internet data center is located performs file storage operations.

此外，为实现上述目的，本发明还提供一种服务端，所述服务端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的文件存储程序，所述文件存储程序被所述处理器执行时实现如上文所述的文件存储方法的步骤。In addition, in order to achieve the above object, the present invention also provides a server, which includes a memory, a processor, and a file storage program stored in the memory and operable on the processor, the file storage When the program is executed by the processor, the steps of the above-mentioned file storage method are implemented.

此外，为实现上述目的，本发明还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有文件存储程序，所述文件存储程序被处理器执行时实现如上文所述的文件存储方法的步骤。In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium, on which a file storage program is stored, and when the file storage program is executed by a processor, the above-mentioned file The steps of the storage method.

本发明提出的技术方案，文件处理系统的服务端先通过客户端接收发送方上传的文件，然后将接收到的文件缓存至临时文件夹中，并在分布式存储系统中记录各个文件的存储位置信息，再对临时文件夹中的各个文件进行合并处理，得到合并后的文件，并将合并后的文件存储到分布式文件系统中，最终基于合并后的文件，更新分布式存储系统中对应的存储位置信息，便于后续根据所述存储位置信息读取文件。本方案中，对接收到的文件进行合并处理，再将合并后的文件存储至分布式文件系统中，文件的合并使得系统可存储的文件量增大，此外，由于分布式文件系统具有可扩展性，通过分布式文件系统存储文件，可存储的文件数量更多，相对于现有的文件存储方式，本方案可储存的文件量更大，更适合存储大量的小文件。In the technical solution proposed by the present invention, the server of the file processing system first receives the file uploaded by the sender through the client, then caches the received file in a temporary folder, and records the storage location of each file in the distributed storage system information, and then merge each file in the temporary folder to obtain the merged file, store the merged file in the distributed file system, and finally update the corresponding file in the distributed storage system based on the merged file The location information is stored to facilitate subsequent reading of files according to the storage location information. In this solution, the received files are merged, and then the merged files are stored in the distributed file system. The file merger increases the amount of files that can be stored in the system. In addition, because the distributed file system has scalable Compared with the existing file storage method, this solution can store a larger amount of files and is more suitable for storing a large number of small files.

附图说明Description of drawings

图1是本发明实施例方案涉及的硬件运行环境的服务端结构示意图；Fig. 1 is a schematic diagram of the server structure of the hardware operating environment involved in the solution of the embodiment of the present invention;

图2为本发明文件存储方法第一实施例的流程示意图；Fig. 2 is a schematic flow chart of the first embodiment of the file storage method of the present invention;

图3是本发明的文件存储架构图；Fig. 3 is a file storage architecture diagram of the present invention;

图4为本发明文件合并的示意图；Fig. 4 is the schematic diagram that the document of the present invention merges;

图5为本发明文件存储方法第二实施例的流程示意图；FIG. 5 is a schematic flowchart of a second embodiment of the file storage method of the present invention;

图6为本发明文件传输的示例图；Fig. 6 is an exemplary diagram of file transmission in the present invention;

图7为本发明文件存储方法第三实施例的流程示意图；FIG. 7 is a schematic flowchart of a third embodiment of the file storage method of the present invention;

图8为本发明文件删除的示例图。Fig. 8 is an example diagram of file deletion in the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose of the present invention, functional characteristics and advantages will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

本发明实施例的解决方案主要是：文件处理系统的服务端先通过客户端接收发送方上传的文件，然后将接收到的文件缓存至临时文件夹中，并在分布式存储系统中记录各个文件的存储位置信息，再对临时文件夹中的各个文件进行合并处理，得到合并后的文件，并将合并后的文件存储到分布式文件系统中，最终基于合并后的文件，更新分布式存储系统中对应的存储位置信息，便于后续根据所述存储位置信息读取文件。以解决现有的文件存储方式，难以满足存储需求的问题。The solution of the embodiment of the present invention is mainly: the server of the file processing system first receives the file uploaded by the sender through the client, then caches the received file in a temporary folder, and records each file in the distributed storage system storage location information, and then merge each file in the temporary folder to obtain the merged file, store the merged file in the distributed file system, and finally update the distributed storage system based on the merged file The corresponding storage location information in the file is convenient for subsequent reading of the file according to the storage location information. In order to solve the problem that the existing file storage method is difficult to meet the storage demand.

需要说明的是，现有的文件存储方式，还存储以下几点缺陷：It should be noted that the existing file storage method also has the following defects:

文件存储方案没有生命周期管理功能，不支持临时文件过期删除等功能，容易导致数据存储过多；The file storage solution does not have a life cycle management function, and does not support functions such as temporary file expiration and deletion, which may easily lead to excessive data storage;

不适合大量系统的接入，安装部署相对比较麻烦。It is not suitable for the access of a large number of systems, and installation and deployment are relatively troublesome.

基于现有技术存在的问题，本发明搭建一个FPS(File Process System，文件处理系统)，该FPS可以支持海量数据的存储，同时采用数据跨机架、机房存储多份的方案来保证服务的高可用性。其主要应用场景包括：Based on the problems existing in the prior art, the present invention builds a FPS (File Process System, file processing system), the FPS can support the storage of massive data, and at the same time adopts the scheme of storing multiple copies of data across racks and computer rooms to ensure high service quality. availability. Its main application scenarios include:

(1)提供一个中间平台供不同系统间的文件交换，例如A系统通过中间平台提供对账文件给B系统进行对账；(1) Provide an intermediate platform for file exchange between different systems, for example, system A provides reconciliation files to system B for reconciliation through the intermediate platform;

(2)提供一个基于文件生命周期管理的数据存储平台，能支持海量数据存储，且文件存储一段时间，到期需要自动删除。(2) Provide a data storage platform based on file life cycle management, which can support massive data storage, and files are stored for a period of time and need to be automatically deleted when they expire.

本发明的专业术语介绍：Technical terms of the present invention are introduced:

Hadoop：是一个分布式系统基础架构，能够让用户架构和使用的分布式计算平台，用户可以在Hadoop上开发和运行处理海量数据的应用程序。Hadoop: It is a distributed system infrastructure that enables users to build and use a distributed computing platform. Users can develop and run applications that process massive amounts of data on Hadoop.

HDFS：分布式文件系统(Hadoop Distributed File System)。HDFS有高容错性的特点，并且设计用来部署在低廉的(low-cost)硬件上；而且它提供高吞吐量(highthroughput)来访问应用程序的数据，适合那些有着超大数据集(large data set)的应用程序。HDFS: Distributed file system (Hadoop Distributed File System). HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost (low-cost) hardware; and it provides high throughput (high throughput) to access application data, suitable for those with large data sets (large data set )s application.

HBase：是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统，利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。属于Hadoop生态圈。HBase: It is a high-reliability, high-performance, column-oriented, and scalable distributed storage system. Using HBase technology, a large-scale structured storage cluster can be built on a cheap PC Server. Belongs to the Hadoop ecosystem.

Zookeeper：是一个分布式的，开放源码的分布式应用程序协调服务，是Google的Chubby一个开源的实现，是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件，提供的功能包括：配置维护、域名服务、分布式同步、组服务等。属于Hadoop生态圈。Zookeeper: It is a distributed, open source distributed application coordination service, an open source implementation of Google's Chubby, and an important component of Hadoop and Hbase. It is a software that provides consistent services for distributed applications. Its functions include: configuration maintenance, domain name service, distributed synchronization, group service, etc. Belongs to the Hadoop ecosystem.

TGW：全称Tencent GateWay，是一套实现多网统一接入、外网网络请求转发、支持自动负载均衡的系统，TGW可称为网关。TGW: The full name is Tencent GateWay. It is a system that realizes unified access of multiple networks, forwarding of external network requests, and supports automatic load balancing. TGW can be called a gateway.

NAS：网络附属存储(Network Attached Storage)，是连接在网络上，具备资料存储功能的装置，因此也称为“网络存储器”。它是一种专用数据存储服务器。NAS: Network Attached Storage, which is a device connected to the network and capable of data storage, so it is also called "network storage". It is a dedicated data storage server.

RMB：消息总线系统，用于在多系统之间的RPC消息服务。RMB: Message bus system, used for RPC message service between multiple systems.

如图1所示，图1是本发明实施例方案涉及的硬件运行环境的服务端结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic diagram of the server-side structure of the hardware operating environment involved in the solution of the embodiment of the present invention.

本发明实施例服务端可以是PC(personal computer，个人计算机)，也可以是平板电脑、便携计算机等具有显示功能的终端设备。The server in this embodiment of the present invention may be a PC (personal computer, personal computer), or a terminal device with a display function such as a tablet computer or a portable computer.

如图1所示，该服务端可以包括：处理器1001，例如CPU，通信总线1002、用户接口1003，网络接口1004，存储器1005。其中，通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard)，可选用户接口1003还可以包括标准的有线接口(例如用于连接有线键盘、有线鼠标等)、无线接口(例如用于连接无线键盘、无线鼠标)。网络接口1004可选的可以包括标准的有线接口(用于连接有线网络)、无线接口(如WI-FI接口、蓝牙接口、红外线接口等，用于连接无线网络)。存储器1005可以是高速RAM存储器，也可以是稳定的存储器(non-volatile memory)，例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the server may include: a processor 1001 , such as a CPU, a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Wherein, the communication bus 1002 is used to realize connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface (for example, for connecting a wired keyboard, a wired mouse, etc.), a wireless interface (for example, for Connect wireless keyboard, wireless mouse). The network interface 1004 may optionally include a standard wired interface (for connecting to a wired network) and a wireless interface (such as a WI-FI interface, a Bluetooth interface, an infrared interface, etc., for connecting to a wireless network). The memory 1005 can be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

可选地，服务端还可以包括摄像头、RF(Radio Frequency，射频)电路，传感器、音频电路、WiFi模块等等。Optionally, the server may also include a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like.

本领域技术人员可以理解，图1中示出的服务端结构并不构成对服务端的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art can understand that the server structure shown in FIG. 1 does not constitute a limitation on the server, and may include more or less components than shown in the figure, or combine some components, or arrange different components.

如图1所示，作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及文件存储程序。其中，操作系统是管理和控制服务端与软件资源的程序，支持网络通信模块、用户接口模块、文件存储程序以及其他程序或软件的运行；网络通信模块用于管理和控制网络接口1002；用户接口模块用于管理和控制用户接口1003。As shown in FIG. 1 , the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a file storage program. Among them, the operating system is a program that manages and controls the server and software resources, and supports the operation of the network communication module, user interface module, file storage program, and other programs or software; the network communication module is used to manage and control the network interface 1002; the user interface Modules are used to manage and control the user interface 1003 .

在图1所示的服务端中，网络接口1004主要用于连接备互联网数据中心的服务端，与备互联网数据中心的服务端进行数据通信；用户接口1003主要用于连接客户端(用户端)，与客户端进行数据通信；所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，并执行以下步骤：In the service end shown in Figure 1, the network interface 1004 is mainly used to connect the server end of the Internet data center, and carry out data communication with the server end of the Internet data center; the user interface 1003 is mainly used to connect the client (client) , performing data communication with the client; the server invokes the file storage program stored in the memory 1005 through the processor 1001, and performs the following steps:

通过客户端接收发送方上传的文件；Receive the file uploaded by the sender through the client;

进一步地，所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，以实现对临时文件夹中的各个文件进行合并处理，得到合并后的文件的步骤：Further, the server calls the file storage program stored in the memory 1005 through the processor 1001, so as to realize the steps of merging each file in the temporary folder and obtaining the merged file:

扫描所述临时文件夹中的各个文件；scan individual files in said temporary folder;

进一步地，所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，以实现以下步骤：Further, the server invokes the file storage program stored in the memory 1005 through the processor 1001 to implement the following steps:

进一步地，所述将合并后的文件存储到分布式文件系统中的步骤之后，所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，以实现以下步骤：Further, after the step of storing the merged file in the distributed file system, the server invokes the file storage program stored in the memory 1005 through the processor 1001 to implement the following steps:

基于分布式文件系统中存储的文件，生成文件标识信息及文件哈希信息；Based on the files stored in the distributed file system, generate file identification information and file hash information;

进一步地，所述服务端的个数包括多个，所述文件处理系统的服务端和客户端通过网关连接，文件从客户端上传至服务端的方式包括：网关按照预设的策略，将客户端上传的文件轮询上传至服务端中。Further, the number of the server includes multiple, the server and the client of the file processing system are connected through a gateway, and the way of uploading the file from the client to the server includes: the gateway uploads the client to the server according to a preset strategy. The file polling uploaded to the server.

进一步地，所述基于合并后的文件，更新分布式存储系统中对应的存储位置信息的步骤之后，所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，以实现以下步骤：Further, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the server invokes the file storage program stored in the memory 1005 through the processor 1001 to implement the following steps:

扫描分布式文件系统中的各个文件，以监测各个文件的存储时长；Scan each file in the distributed file system to monitor the storage time of each file;

进一步地，所述互联网数据中心还包括分布式应用程序协调服务，所述服务端扫描分布式文件系统中的各个文件，以监测各个文件的存储时长的步骤之前，所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，以实现以下步骤：Further, the Internet data center also includes a distributed application program coordination service, and the server scans each file in the distributed file system to monitor the storage duration of each file. Call the file storage program stored in the memory 1005 to realize the following steps:

向分布式应用程序协调服务发送删除锁的请求信息；Send the request information of deleting the lock to the distributed application coordination service;

进一步地，所述服务端位于主互联网数据中心中，在系统中存在备互联网数据中心的情况下，所述基于合并后的文件，更新分布式存储系统中对应的存储位置信息的步骤之后，所述服务端通过处理器1001调用存储器1005中存储的文件存储程序，以实现以下步骤：Further, the server is located in the main Internet data center. If there is a backup Internet data center in the system, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the The server calls the file storage program stored in the memory 1005 through the processor 1001 to achieve the following steps:

将存储的文件同步到备互联网数据中心所在文件处理系统的服务端中，以供备互联网数据中心所在文件处理系统的服务端执行文件存储操作。Synchronize the stored files to the server of the file processing system where the standby Internet data center is located, so that the server of the file processing system where the standby Internet data center is located performs file storage operations.

基于上述服务端的硬件结构，提出本发明文件存储方法各个实施例。Based on the above hardware structure of the server, various embodiments of the file storage method of the present invention are proposed.

参照图2，图2为本发明文件存储方法第一实施例的流程示意图。Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a first embodiment of a file storage method according to the present invention.

在本实施例中，所述文件存储方法应用于互联网数据中心，所述互联网数据中心包括文件处理系统、分布式文件系统、分布式存储系统，所述文件处理系统包括服务端和客户端，所述文件存储方法包括：In this embodiment, the file storage method is applied to an Internet data center, and the Internet data center includes a file processing system, a distributed file system, and a distributed storage system, and the file processing system includes a server and a client, so The file storage methods described above include:

文件处理系统的服务端通过客户端接收发送方上传的文件；将接收到的文件缓存至临时文件夹中，并在分布式存储系统中记录各个文件的存储位置信息；对临时文件夹中的各个文件进行合并处理，得到合并后的文件，并将合并后的文件存储到分布式文件系统中；基于合并后的文件，更新分布式存储系统中对应的存储位置信息。The server of the file processing system receives the files uploaded by the sender through the client; caches the received files in a temporary folder, and records the storage location information of each file in the distributed storage system; The files are merged to obtain the merged files, and the merged files are stored in the distributed file system; based on the merged files, the corresponding storage location information in the distributed storage system is updated.

在本实施例中，所述文件存储方法应用于IDC(Internet Data Center，互联网数据中心)所在文件处理系统FPS对应的服务端中，所述服务端可选为图2所示的服务端。In this embodiment, the file storage method is applied to the server corresponding to the file processing system FPS where the IDC (Internet Data Center, Internet Data Center) is located, and the server may be the server shown in FIG. 2 .

需要说明的是，本发明实施例设置FPS(File Process System，文件处理系统)，该FPS是自主研发的小文件存储处理系统，具体生命周期管理，跨机房容灾等功能。It should be noted that the embodiment of the present invention sets up FPS (File Process System, file processing system), which is a self-developed small file storage and processing system, specific life cycle management, cross-room disaster recovery and other functions.

本发明实施中，所述IDC的结构图可参照图3：In the implementation of the present invention, the structural diagram of the IDC can refer to Figure 3:

IDC中包括文件处理系统FPS、分布式文件系统HDFS、分布式存储系统Hbase，其中，FPS主要包括两部分：FPS-Client(客户端)和FPS-Server(服务端)。业务程序通过集成FPS-Client来使用FPS提供的功能。IDC includes file processing system FPS, distributed file system HDFS, and distributed storage system Hbase. Among them, FPS mainly includes two parts: FPS-Client (client) and FPS-Server (server). Business programs use the functions provided by FPS by integrating FPS-Client.

FPS-Client：FPS对外提供java版本、c语言版本、python版本的客户端。FPS-Client: FPS provides clients of java version, c language version and python version.

FPS-Server：FPS的后台服务端程序，提供了主要的业务逻辑处理，包括权限控制、文件的存储与读取、文件的生命周期管理等功能。FPS-Server: The background server program of FPS, which provides the main business logic processing, including authority control, file storage and reading, file life cycle management and other functions.

从图3中可看出，所述文件处理系统的服务端和客户端通过网关连接，所述网关用TGW表示，该TGW(Tencent Gataway)用于文件转发时，做负载均衡。It can be seen from FIG. 3 that the server and the client of the file processing system are connected through a gateway, and the gateway is represented by TGW. The TGW (Tencent Gataway) is used for load balancing when forwarding files.

需要说明的是，FPS的服务端和客户端的个数不做限定，可根据实际情况设置个数，在所述服务端的个数包括多个的情况下，文件从客户端上传至服务端的方式包括：网关按照预设的策略，将客户端上传的文件轮询上传至服务端中。It should be noted that the number of FPS servers and clients is not limited, and the number can be set according to the actual situation. When the number of servers includes multiple, the ways of uploading files from the client to the server include: : The gateway polls and uploads the files uploaded by the client to the server according to the preset strategy.

即，通过网关，即FPS-Client上传文件到FPS-Server中的所有流量都会经过TGW，通过TGW将文件轮询发送给各个FPS-Server。FPS-Server收到文件之后，将文件内容存储在HDFS(分布式文件系统)中，同时将文件的存储位置信息记录到HBASE(分布式存储系统)中。That is, all traffic that uploads files to FPS-Server through the gateway, that is, FPS-Client, will pass through TGW, and the file polling will be sent to each FPS-Server through TGW. After FPS-Server receives the file, it stores the content of the file in HDFS (Distributed File System), and records the storage location information of the file in HBASE (Distributed Storage System).

需要说明的是，HDFS是一个分布式文件系统，采用PC作为存储，一般文件会有多个备份，如三个备份，可以很方便的动态线性增加机器，面对互联网业务指数型增长，可以很方便的做到不停机扩容，可以很好的满足业务的需求。同时，将文件的存储位置信息保存在HBase上，可以提供千亿级文件索引的存储，HBase跟HDFS一样，同属于Hadoop生态圈的成员，也可以很方便通过增加节点来增加存储性能。It should be noted that HDFS is a distributed file system that uses PCs as storage. Generally, files will have multiple backups, such as three backups. It is very convenient to dynamically increase machines linearly. Facing the exponential growth of Internet services, it can be easily It is convenient to achieve non-stop expansion, which can well meet business needs. At the same time, storing the file storage location information on HBase can provide hundreds of billions of file index storage. Like HDFS, HBase is a member of the Hadoop ecosystem, and it is also convenient to increase storage performance by adding nodes.

在本发明实施例中，FPS-Client跟FPS-Server之间的交互通过HTTP(Hyper TextTransfer Protocol，超文本传输协议)协议进行。In the embodiment of the present invention, the interaction between the FPS-Client and the FPS-Server is performed through the HTTP (Hyper TextTransfer Protocol, hypertext transfer protocol) protocol.

以下是本实施例中实现文件存储的具体步骤：The following are the specific steps for implementing file storage in this embodiment:

步骤S10，文件处理系统的服务端通过客户端接收发送方上传的文件；Step S10, the server of the file processing system receives the file uploaded by the sender through the client;

即，服务端接收客户端通过TGW上传的文件。That is, the server receives the file uploaded by the client through the TGW.

步骤S20，将接收到的文件缓存至临时文件夹中，并在分布式存储系统中记录各个文件的存储位置信息；Step S20, cache the received files into a temporary folder, and record the storage location information of each file in the distributed storage system;

在本实施例中，如图3所示，FPS-Server通过FPS-Client接收发送方发送的文件，当FPS-Server接收到文件之后，先将接收到的文件缓存至临时文件夹中，该临时文件夹优选为HDFS的指定目录，当FPS-Server将文件缓存至临时文件夹之后，先在分布式存储文件中记录各个文件的存储位置信息。In this embodiment, as shown in Figure 3, the FPS-Server receives the file sent by the sender through the FPS-Client. After the FPS-Server receives the file, it first caches the received file in a temporary folder. The folder is preferably a designated directory of HDFS. After the FPS-Server caches the files to the temporary folder, it first records the storage location information of each file in the distributed storage file.

步骤S30，对临时文件夹中的各个文件进行合并处理，得到合并后的文件，并将合并后的文件存储到分布式文件系统中；Step S30, merge each file in the temporary folder to obtain the merged file, and store the merged file in the distributed file system;

具体地，所述“对临时文件夹中的各个文件进行合并处理，得到合并后的文件”的步骤包括：Specifically, the step of "merging each file in the temporary folder to obtain the merged file" includes:

步骤1，所述服务端扫描所述临时文件夹中的目录，以获取目录对应的锁；Step 1, the server scans the directory in the temporary folder to obtain a lock corresponding to the directory;

步骤2，在获取到锁的情况下，所述服务端扫描所述临时文件夹中的各个文件；Step 2, when the lock is obtained, the server scans each file in the temporary folder;

步骤3，获取组合文件，并在扫描的文件中确定与所述组合文件合并后的容量值小于预设阀值的文件，将确定的文件合并到所述组合文件中。Step 3: Obtain the combined file, and determine among the scanned files a file whose capacity value after being combined with the combined file is smaller than a preset threshold value, and merge the determined file into the combined file.

进一步地，步骤3之后，所述方法还包括：Further, after step 3, the method also includes:

步骤4，删除所述临时文件夹中被合并的文件。Step 4, delete the merged files in the temporary folder.

即，当上传的文件缓存至临时文件夹之后，FPS-Server扫描所述临时文件夹中的目录，以获取目录对应的锁，若能获取到锁，该FPS-Server扫描该临时文件夹中的各个文件的容量值，并优选对与所述组合文件合并后的容量值小于预设阈值的文件进行合并处理，本实施例中，所述预设阈值根据实际情况设置，此处不做限定。文件的合并方式为：获取预设的组合文件，扫描所述临时文件夹中的各个文件，在扫描的文件中确定与所述组合文件合并后的容量值小于预设阈值的文件，并将确定的文件合并到组合文件中，在对文件合并后，即可将合并后的文件存储到分布式文件系统HDFS中，再根据合并后的文件，更新分布式存储系统中合并的文件对应的索引信息。That is, after the uploaded file is cached to the temporary folder, the FPS-Server scans the directory in the temporary folder to obtain the lock corresponding to the directory. If the lock can be obtained, the FPS-Server scans the directory in the temporary folder. The capacity value of each file, and preferably merge the files whose capacity value after merging with the combined file is less than a preset threshold. In this embodiment, the preset threshold is set according to the actual situation and is not limited here. The file merging method is as follows: obtain a preset combined file, scan each file in the temporary folder, determine among the scanned files the file whose capacity value after merging with the combined file is less than a preset threshold, and determine After the files are merged, the merged files can be stored in the distributed file system HDFS, and then the index information corresponding to the merged files in the distributed storage system can be updated according to the merged files .

本发明实施例中，可以是定期扫描临时文件夹中的各个文件，或者是实时扫描文件夹中的各个文件，再对扫描到的小文件进行合并处理，并在HBase中更新小文件合并之后的存储位置信息。由FPS-Server对小文件进行合并处理，使得后续存储到分布式文件系统中的文件不会太零散，可减小存储文件所占用的空间，能明显节省集群节点的内存空间。In the embodiment of the present invention, each file in the temporary folder can be scanned regularly, or each file in the folder can be scanned in real time, and then the scanned small files can be merged, and the merged small files can be updated in HBase Store location information. Small files are merged by FPS-Server, so that the subsequent files stored in the distributed file system will not be too fragmented, which can reduce the space occupied by stored files and significantly save the memory space of cluster nodes.

为更好理解本实施例，参照图4，首先，FPS-Server先对HDFS的临时目录进行扫描，然后获取目录对应的锁，在获取到锁的情况下，扫描目录下的每个文件，然后在分布式存储系统中获取小文件索引信息，再获取组合文件，将小文件合并到组合文件中，最终更新索引信息并删除分别分布式文件系统中的原始小文件。For a better understanding of this embodiment, with reference to Fig. 4, at first, FPS-Server first scans the temporary directory of HDFS, then obtains the lock corresponding to the directory, and when the lock is obtained, scans each file in the directory, and then Obtain the index information of the small files in the distributed storage system, then obtain the combined files, merge the small files into the combined files, finally update the index information and delete the original small files in the respective distributed file systems.

步骤S40，基于合并后的文件，更新分布式存储系统中对应的存储位置信息。Step S40, based on the merged file, update the corresponding storage location information in the distributed storage system.

在得到合并后的文件之后，FPS-Server基于合并后的文件更新分布式存储系统，以在分布式存储系统中更新合并的文件对应的存储位置信息，也就是说，FPS-Server将合并后的文件在分布式文件系统中的存储位置信息更新到分布式存储系统中，便于后续查找文件时，根据该存储位置信息索引到对应的信息。After obtaining the merged file, FPS-Server updates the distributed storage system based on the merged file, so as to update the storage location information corresponding to the merged file in the distributed storage system, that is, FPS-Server updates the merged file The storage location information of the file in the distributed file system is updated to the distributed storage system, so that the corresponding information can be indexed according to the storage location information when the file is subsequently searched.

此外，本发明实施例中，所述方法还包括：In addition, in the embodiment of the present invention, the method also includes:

步骤A，在接收到文件查询指令时，确定文件查询指令对应的索引信息；Step A, when receiving the file query command, determine the index information corresponding to the file query command;

步骤B，在分布式文件系统中查找所述索引信息所指向的已合并文件；Step B, searching the merged file pointed to by the index information in the distributed file system;

步骤C，对所述已合并文件进行还原，以从已合并文件中还原出所述索引信息对应的文件。Step C, restore the merged file, so as to restore the file corresponding to the index information from the merged file.

本实施例提出的技术方案，文件处理系统的服务端先通过客户端接收发送方上传的文件，然后将接收到的文件缓存至临时文件夹中，并在分布式存储系统中记录各个文件的存储位置信息，再对临时文件夹中的各个文件进行合并处理，得到合并后的文件，并将合并后的文件存储到分布式文件系统中，最终基于合并后的文件，更新分布式存储系统中对应的存储位置信息，便于后续根据所述存储位置信息读取文件。本方案中，对接收到的文件进行合并处理，再将合并后的文件存储至分布式文件系统中，文件的合并使得系统可存储的文件量增大，此外，由于分布式文件系统具有可扩展性，通过分布式文件系统存储文件，可存储的文件数量更多，相对于现有的文件存储方式，本方案可储存的文件量更大，更适合存储大量的小文件。In the technical solution proposed in this embodiment, the server of the file processing system first receives the file uploaded by the sender through the client, then caches the received file in a temporary folder, and records the storage of each file in the distributed storage system. Location information, and then merge each file in the temporary folder to obtain the merged file, store the merged file in the distributed file system, and finally update the corresponding file in the distributed storage system based on the merged file The storage location information of the file is convenient for subsequent reading of the file according to the storage location information. In this solution, the received files are merged, and then the merged files are stored in the distributed file system. The file merger increases the amount of files that can be stored in the system. In addition, because the distributed file system has scalable Compared with the existing file storage method, this solution can store a larger amount of files and is more suitable for storing a large number of small files.

进一步地，参照图5，基于第一实施例提出本发明文件存储方法的第二实施例。Further, referring to FIG. 5 , a second embodiment of the file storage method of the present invention is proposed based on the first embodiment.

文件存储方法的第二实施例与文件存储方法的第一实施例的区别在于，所述步骤S30之后，所述方法还包括：The difference between the second embodiment of the file storage method and the first embodiment of the file storage method is that after the step S30, the method further includes:

步骤S50，所述服务端基于分布式文件系统中存储的文件，生成文件标识信息及文件哈希信息；Step S50, the server generates file identification information and file hash information based on the files stored in the distributed file system;

步骤S60，通过所述客户端反馈文件标识信息及文件哈希信息至所述发送方，以供所述发送方将文件标识信息及文件哈希信息传输至接收方；Step S60, feeding back the file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and file hash information to the receiver;

步骤S70，通过所述客户端接收到所述接收方发送的文件标识信息时，在分布式文件系统中提取所述文件标识信息对应的文件，并反馈至所述接收方，以供所述接收方通过文件哈希信息检验所述文件，并在检验成功时获取所述文件。Step S70, when the client receives the file identification information sent by the receiver, extract the file corresponding to the file identification information in the distributed file system, and feed it back to the receiver for the receiver The party checks the file through the file hash information, and obtains the file when the check is successful.

在本实施例中，当FPS-Server在分布式文件系统HDFS中存储文件之后，根据该HDFS存储的文件，生成该文件对应的文件标识信息(File Id)及文件哈希信息(FileHash)，在得到该File Id和File Hash之后，FPS-Server反馈该File Id和File Hash至发送方，以供发送方将该File Id和File Hash传输至接收方。In this embodiment, after the FPS-Server stores the file in the distributed file system HDFS, according to the file stored in the HDFS, the file identification information (FileId) and the file hash information (FileHash) corresponding to the file are generated. After obtaining the File Id and File Hash, the FPS-Server feeds back the File Id and File Hash to the sender, so that the sender can transmit the File Id and File Hash to the receiver.

需要说明的是，发送方和接收方之间通过RMB消息服务总线进行交互。当接收方接收到该File Id和File Hash之后，使用FPS-Client，将File Id发送至所述FPS-Server。It should be noted that the sender and receiver interact through the RMB message service bus. After receiving the File Id and File Hash, the receiver uses the FPS-Client to send the File Id to the FPS-Server.

当FPS-Server通过所述FPS-Client接收到该接收方发送的File Id时，在HDFS中提取所述File Id对应的文件，并反馈至所述接收方，以供所述接收方通过File Hash检验所述文件，并在检验成功时获取所述文件。也就是说，接收方通过File Id到FPS-Server进行下载，并通过File Hash校验文件的准确性，下载跟文件正确性校验都在FPS-Client中完成。When the FPS-Server receives the File Id sent by the receiver through the FPS-Client, it extracts the file corresponding to the File Id in HDFS, and feeds it back to the receiver, so that the receiver passes the File Hash The file is verified and, if the verification is successful, the file is retrieved. That is to say, the receiver downloads from the FPS-Server through the File Id, and verifies the accuracy of the file through the File Hash. The download and the verification of the correctness of the file are both completed in the FPS-Client.

为更好理解本实施例，参照图6，发送方将文件上传至FPS，并在文件上传成功后，FPS会返回给发送方该文件的File Id跟File Hash，发送方接收到之后，将File Id跟FileHash通过RMB消息服务总线发送给接收方.接收方收到消息通知后，使用FPS的FPS-Client，发送File Id到FPS的FPS-Server进行文件的下载，由FPS-Client通过File Hash校验文件的准确性，在校验成功后，FPS-Client返回给接收方文件下载成功的消息。For a better understanding of this embodiment, with reference to Figure 6, the sender uploads the file to the FPS, and after the file is uploaded successfully, the FPS will return the File Id and File Hash of the file to the sender, and after the sender receives it, the File The Id and FileHash are sent to the receiver through the RMB message service bus. After receiving the message notification, the receiver uses the FPS-Client of FPS to send the File Id to the FPS-Server of FPS to download the file, and the FPS-Client checks the file through File Hash Verify the accuracy of the file. After the verification is successful, FPS-Client returns a message that the file has been successfully downloaded to the receiver.

在本实施例中，通过互联网数据中心中各个系统实现发送方和接收方之间的文件传输，并由文件标识信息和文件哈希信息进行文件的校验，提高了文件传输的准确性。In this embodiment, the file transfer between the sender and the receiver is realized through various systems in the Internet data center, and file verification is performed by file identification information and file hash information, which improves the accuracy of file transfer.

进一步地，参照图7，基于第一实施例提出本发明文件存储方法的第三实施例。Further, referring to FIG. 7 , a third embodiment of the file storage method of the present invention is proposed based on the first embodiment.

文件存储方法的第三实施例与文件存储方法的第一实施例的区别在于，所述步骤S40之后，所述方法包括：The difference between the third embodiment of the file storage method and the first embodiment of the file storage method is that after the step S40, the method includes:

步骤S80，所述服务端扫描分布式文件系统中的各个文件，以监测各个文件的存储时长；Step S80, the server scans each file in the distributed file system to monitor the storage duration of each file;

步骤S90，在有文件的存储时长达到预设时长时，删除所述分布式文件系统中的所述文件，并删除所述分布式存储系统中所述文件的存储位置信息。Step S90, when the storage duration of a file reaches a preset duration, delete the file in the distributed file system, and delete the storage location information of the file in the distributed storage system.

在本实施例中，服务端扫描分布式文件系统中的各个文件的方式优选为定时扫描。因此，在分布式文件系统HDFS存储文件之后，FPS-Server后台会定期执行下述操作：定期扫描已经过期的文件并删除文件，以节省磁盘空间。具体地：所述FPS-Server扫描分布式文件系统HDFS中的各个文件，以监测各个文件的存储时长，在有文件的存储时长达到预设时长时，所述预设时长根据实际情况设定，不做限定，如该预设时长为3个月。当文件的存储时长达到预设时长时，说明该文件的存储时间较长，为了实现文件存储的生命周期管理，删除所述分布式文件系统中的所述文件，并删除所述分布式存储系统中所述文件的存储位置信息。In this embodiment, the manner in which the server scans each file in the distributed file system is preferably regular scanning. Therefore, after the distributed file system HDFS stores files, the FPS-Server background will periodically perform the following operations: regularly scan expired files and delete files to save disk space. Specifically: the FPS-Server scans each file in the distributed file system HDFS to monitor the storage duration of each file, and when the storage duration of a file reaches a preset duration, the preset duration is set according to actual conditions, There is no limitation, for example, the preset period is 3 months. When the storage time of the file reaches the preset time, it means that the storage time of the file is longer. In order to realize the life cycle management of file storage, delete the file in the distributed file system and delete the distributed storage system Storage location information for the files described in .

在本实施例中，所述互联网数据中心还包括分布式应用程序协调服务，所述步骤S80之前，所述方法还包括：In this embodiment, the Internet data center also includes a distributed application program coordination service, and before the step S80, the method also includes:

如图3所示，所述应用程序协调服务用Zookeeper表示。在删除文件之前，FPS-Server先向Zookeeper发送获取删除锁的请求信息，若能成功获取到锁，才执行步骤S80。As shown in Figure 3, the application coordination service is represented by Zookeeper. Before deleting the file, the FPS-Server first sends a request message to Zookeeper to acquire the delete lock, and only executes step S80 if the lock can be successfully acquired.

为更好理解本发明，参照图8，FPS-Server定时向Zookeeper发送获取删除锁的请求信息，若能获取到锁，即FPS-Server获取锁成功，此时，FPS-Server向分布式存储系统Hbase请求过期的数据，即FPS-Server向Hbase请求存储时长达到预设时长的文件对应的存储位置信息，在请求到之后，删除该请求到的存储位置信息。后续，FPS-Server向Zookeeper请求获取删除锁，并在获取到锁之后，在分布式文件系统HDFS中删除存储时长达到预设时长的文件。For better understanding of the present invention, with reference to Fig. 8, FPS-Server regularly sends to Zookeeper the request information of acquiring deletion lock, if can acquire lock, promptly FPS-Server acquires lock success, at this moment, FPS-Server sends to distributed storage system Hbase requests expired data, that is, FPS-Server requests from Hbase the storage location information corresponding to the file whose storage duration reaches the preset duration, and deletes the requested storage location information after the request is received. Subsequently, FPS-Server requests Zookeeper to obtain a deletion lock, and after obtaining the lock, deletes the files whose storage duration reaches the preset duration in the distributed file system HDFS.

在本实施例中，通过对过期的数据进行定时删除，使得该文件存储具有生命周期，可以定时删除过期文件，防止文件量过大，提高了文件存储的智能性。In this embodiment, by regularly deleting expired data, the file storage has a life cycle, and expired files can be deleted regularly, preventing excessive file size and improving the intelligence of file storage.

进一步地，基于第一实施例提出本发明文件存储方法的第四实施例。Further, a fourth embodiment of the file storage method of the present invention is proposed based on the first embodiment.

文件存储方法的第四实施例与文件存储方法的第一至第三实施例的区别在于，所述服务端位于主互联网数据中心中，在系统中存在备互联网数据中心的情况下，所述步骤S40之后，所述方法包括：The difference between the fourth embodiment of the file storage method and the first to third embodiments of the file storage method is that the server is located in the primary Internet data center, and if there is a backup Internet data center in the system, the steps After S40, the method includes:

步骤D，所述服务端将存储的文件同步到备互联网数据中心所在文件处理系统的服务端中，以供备互联网数据中心所在文件处理系统的服务端执行文件存储操作。Step D, the server synchronizes the stored files to the server of the file processing system where the standby Internet data center is located, so that the server of the file processing system where the standby Internet data center is located performs file storage operations.

在本实施例中，部署的互联网数据中心IDC包括多套，如包括两套互联网数据中心IDC，分别是主IDC和被IDC，两个IDC之间网络互通。业务系统通过集成FPS-Client连接到TGW请求上传文件，TGW按照指定的策略将请求路由到主IDC中的某台FPS-Server，之后正式开始上传文件。FPS-Server将文件临时存储在中，同时文件的位置信息存储在HBASE。In this embodiment, the deployed Internet data center IDC includes multiple sets, for example, two sets of Internet data center IDCs, namely the master IDC and the secondary IDC, and the two IDCs communicate with each other through the network. The business system connects to the TGW to request file upload through the integrated FPS-Client, and the TGW routes the request to a certain FPS-Server in the main IDC according to the specified policy, and then officially starts uploading the file. FPS-Server temporarily stores files in HBASE, while the location information of files is stored in HBASE.

当主IDC中的文件上传成功后，主IDC中的FPS-Server会将该文件异步上传到备IDC中的FPS-Server中，以此来保证两个集群中的文件一致性，FPS的主备同步采用的是逻辑备份。When the file in the primary IDC is successfully uploaded, the FPS-Server in the primary IDC will asynchronously upload the file to the FPS-Server in the standby IDC, so as to ensure the consistency of the files in the two clusters, and the synchronization of the primary and secondary FPS A logical backup is used.

在本实施例中，通过文件的备份，在主IDC故障的情况下，可以由备IDC继续提供服务，不影响文件的存储和使用，可用性更高。In this embodiment, through file backup, in the case of failure of the primary IDC, the standby IDC can continue to provide services without affecting the storage and use of files, and the availability is higher.

此外，本发明实施例还提出一种计算机可读存储介质，所述计算机可读存储介质上存储有文件存储程序，所述文件存储程序被处理器执行时实现如下操作：In addition, an embodiment of the present invention also proposes a computer-readable storage medium, where a file storage program is stored on the computer-readable storage medium, and when the file storage program is executed by a processor, the following operations are implemented:

进一步地，所述文件存储程序被处理器执行时，还实现对临时文件夹中的各个文件进行合并处理，得到合并后的文件的操作：Further, when the file storage program is executed by the processor, it also realizes the operation of merging each file in the temporary folder to obtain the merged file:

进一步地，所述文件存储程序被处理器执行时，还实现以下操作：Further, when the file storage program is executed by the processor, the following operations are also implemented:

进一步地，所述将合并后的文件存储到分布式文件系统中的步骤之后，所述文件存储程序被处理器执行时，还实现以下操作：Further, after the step of storing the merged file in the distributed file system, when the file storage program is executed by the processor, the following operations are also implemented:

进一步地，所述基于合并后的文件，更新分布式存储系统中对应的存储位置信息的步骤之后，所述文件存储程序被处理器执行时，还实现以下操作：Further, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, when the file storage program is executed by the processor, the following operations are also implemented:

进一步地，所述互联网数据中心还包括分布式应用程序协调服务，所述服务端扫描分布式文件系统中的各个文件，以监测各个文件的存储时长的步骤之前，所述文件存储程序被处理器执行时，还实现以下操作：Further, the Internet data center also includes a distributed application program coordination service, and the server scans each file in the distributed file system to monitor the storage time of each file. Before the step, the file storage program is executed by the processor When executed, the following operations are also implemented:

进一步地，所述服务端位于主互联网数据中心中，在系统中存在备互联网数据中心的情况下，所述基于合并后的文件，更新分布式存储系统中对应的存储位置信息的步骤之后，所述文件存储程序被处理器执行时，还实现以下操作：Further, the server is located in the main Internet data center. If there is a backup Internet data center in the system, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the When the above-mentioned file storage program is executed by the processor, the following operations are also realized:

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其它变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其它要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements that are not expressly listed, or that are inherent to the process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products are stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in various embodiments of the present invention.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其它相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technical fields , are all included in the scope of patent protection of the present invention in the same way.

Claims

1. a kind of file memory method, it is characterised in that applied to Internet data center, the Internet data center includes Document handling system, distributed file system, distributed memory system, the document handling system include service end and client End, the file memory method include：

The service end of document handling system receives the file of sender's upload by client；

By the file cache received into temporary folder, and record in distributed memory system the storage position of each file Confidence ceases；

Processing is merged to each file in temporary folder, the file after being merged, and the file after merging is deposited Store up in distributed file system；

Based on the file after merging, corresponding storage location information in distributed memory system is updated.

2. file memory method as claimed in claim 1, it is characterised in that each file in temporary folder enters Row merging treatment, include the step of file after being merged：

The service end scans each file in the temporary folder；

Composition file is obtained, and determines that the capability value after merging with the composition file is less than pre-set threshold value in the file of scanning File, by the Piece file mergence of determination into the composition file.

3. file memory method as claimed in claim 2, it is characterised in that methods described also includes：

When receiving file polling instruction, index information corresponding to file polling instruction is determined；

The merging file pointed by the index information is searched in distributed file system；

The file that merged is reduced, to restore file corresponding to the index information in file from having merged.

4. file memory method as claimed in claim 1, it is characterised in that the file storage by after merging to distribution After step in file system, methods described also includes：

The service end generates file identification information and file Hash information based on the file stored in distributed file system；

By the client feedback file identification information and file Hash information to described sender, so that described sender will File identification information and file Hash information are transmitted to recipient；

When receiving the file identification information that the recipient sends by the client, extracted in distributed file system File corresponding to the file identification information, and the recipient is fed back to, so that the recipient passes through file Hash information The file is examined, and the file is obtained when examining successfully.

5. file memory method as claimed in claim 1, it is characterised in that the number of the service end is described including multiple The service end of document handling system is connected with client by gateway, and the mode that file is uploaded to service end from client includes： The file poll that client uploads is uploaded in service end by gateway according to default strategy.

6. file memory method as claimed in claim 1, it is characterised in that the file based on after merging, renewal distribution In formula storage system the step of corresponding storage location information after, methods described also includes：

Each file in the service end scanning distributed file system, to monitor the storage duration of each file；

When documentary storage duration reaches preset duration, the file in the distributed file system is deleted, and delete Except the storage location information of file described in the distributed memory system.

7. file memory method as claimed in claim 6, it is characterised in that the Internet data center also includes distribution Application program coordination service, the service end scans each file in distributed file system, to monitor depositing for each file Before the step of storing up duration, methods described also includes：

The service end sends the solicited message for deleting lock to distributed application program coordination service；

Obtain lock successfully when, execution scans each file in distributed file system, during monitoring the storage of each file Long step.

8. the file memory method as described in claim any one of 1-7, it is characterised in that the service end is located at main internet In data center, in the case of standby Internet data center being in systems present, the file based on after merging, renewal distribution In formula storage system the step of corresponding storage location information after, methods described includes：

The service end by the service end of document handling system where the file synchronization of storage to standby Internet data center, with The service end of document handling system performs file storage operations where incense Internet data center.

9. a kind of service end, it is characterised in that the service end includes memory, processor and is stored on the memory simultaneously The file storage program that can be run on the processor, the file storage program are realized as weighed during the computing device Profit requires the step of file memory method any one of 1 to 8.

10. a kind of computer-readable recording medium, it is characterised in that be stored with file on the computer-readable recording medium and deposit Program is stored up, the file storage as any one of claim 1 to 8 is realized when the file storage program is executed by processor The step of method.