[go: up one dir, main page]

CN118312359A - Data backup method, device and medium based on super fusion mechanism - Google Patents

Data backup method, device and medium based on super fusion mechanism Download PDF

Info

Publication number
CN118312359A
CN118312359A CN202410502053.7A CN202410502053A CN118312359A CN 118312359 A CN118312359 A CN 118312359A CN 202410502053 A CN202410502053 A CN 202410502053A CN 118312359 A CN118312359 A CN 118312359A
Authority
CN
China
Prior art keywords
data
backup
data block
server
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410502053.7A
Other languages
Chinese (zh)
Inventor
刘丽莉
张菁
李彩芬
张逊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 15 Research Institute
Original Assignee
CETC 15 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 15 Research Institute filed Critical CETC 15 Research Institute
Publication of CN118312359A publication Critical patent/CN118312359A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data backup method, device and medium based on a super fusion mechanism, wherein the method comprises the following steps: constructing a backup system comprising a management platform, a backup server, a storage server and an agent end, wherein the management platform is used for uniformly dispatching system tasks; the backup server is used for managing backup strategies in the management platform; the storage server is used for storing backup data; the agent end is managed by the backup server and is used for executing backup operation; the agent end adopts a variable-length block technology to process backup data, and specifically comprises the following steps: determining the characteristics of the initial variable-length data blocks according to the file characteristic function; when detecting the data change, re-carrying out variable length block segmentation according to the original data block boundary; and constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication.

Description

Data backup method, device and medium based on super fusion mechanism
Technical Field
The invention relates to the field of storage backup, in particular to a data backup method, device and medium based on a super fusion mechanism.
Background
In the traditional technology, the tape technology has slow backup speed, a large amount of data needs a long time to backup, and the tape medium is time-consuming to address and read data. The hardware configuration such as calculation, storage, network and the like in the traditional backup system is fixed once set, and when the data volume is increased and expansion is desired, the equipment needs to be comprehensively replaced, so that the incremental expansion cannot be flexibly performed. Single machine storage cannot grow rapidly due to technology limitations, and it has been difficult to meet the backup demand of explosive growth of data volume. Independent backup software, a special backup server, independent storage equipment, a switch and the like exist, and the components are complex and incompatible; there is no unified platform to centrally manage various devices, monitor backup status, and an administrator can only individually configure and manage each element.
Hardware conditions limit the speed of backup and restore, making it difficult to meet data restore time targets. The computing and storage resources cannot be elastically expanded according to the requirements, and the expandability is limited. The configuration and maintenance of various hardware and software all need professional skills, and the labor cost is increased. The overall system is strongly bound to the backup device of a particular vendor, limiting flexibility and potentially facing vendor-dependent risks.
Disclosure of Invention
In view of the above, the present invention provides a data backup method based on a super fusion mechanism, which includes:
Constructing a backup system comprising a management platform, a backup server, a storage server and a proxy end, wherein,
The management platform is used for uniformly scheduling system tasks; the backup server is used for managing backup strategies in the management platform; the storage server is used for storing backup data; the agent end is managed by the backup server and is used for executing backup operation; the agent end adopts a variable-length block technology to process backup data, and specifically comprises the following steps:
determining the characteristics of the initial variable-length data blocks according to the file characteristic function; when detecting the data change, re-carrying out variable length block segmentation according to the original data block boundary; and constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication.
In particular, the proxy end further comprises: acquiring metadata corresponding to the incremental data; the metadata includes information describing delta data; acquiring a writing state mark of the metadata, wherein the writing state mark represents whether incremental data is successfully written into a memory; and when the writing state mark confirms that the incremental data writing is successful, the incremental data is read according to the metadata.
In particular, determining the characteristics of the initial variable length data block according to the file characteristic function specifically includes: establishing fingerprint information for each data block according to a hash (hash) algorithm realized by an assembly instruction of vector calculation; constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication, wherein the method specifically comprises the following steps:
constructing a fingerprint library for storing fingerprints with variable length blocks at a backup server side; the fingerprint library is sent to the proxy end, and fingerprint comparison is carried out by loading on the proxy end; and judging whether the variable length blocks are repeated or not through fingerprint comparison so as to selectively skip repeated data blocks.
Particularly, when the data change is detected, the variable length block segmentation is performed again according to the original data block boundary, and the method specifically comprises the following steps: when the changed data is not at the boundary of the data block, the original data block is kept unchanged; when the newly added data is positioned at the boundary of the data block, the original data block is split into a plurality of data blocks; when the change data is within the sliding window, the original neighboring data blocks merge or adjust the boundary.
In particular, the method further comprises: determining different backup strategies according to the importance of the data, wherein the backup strategies comprise:
backup mode, backup period and retention time R of backup data.
In particular, the backup data retention time R may be adjusted according to the calculated change in the adjusted data block size S z:
If the size S z of the data block is reduced, so that the retention time R of the backup data can be shortened, and the backup data can be cleaned in time;
if the size S z of the data block is increased, R can be properly prolonged, and more backup histories are reserved;
The backup data retention time R may specifically be set as: r=r0 (1+k s z)
Wherein R0 is a preset basic backup data retention time, k is an adjustment coefficient, and S z is an adjustment data block size.
In particular, in the calculation of the adjustment data block size S z: let the original data block size be s (t), the changed data block size be s (t+1), the number of data blocks be N, the calculation formula of the new data block size after the variable length block segmentation is:
S z represents the adjustment data block size; s (t+1) -s (t) | represents the absolute value of the amount of change in the adjacent data block; max (s (t)), min (s (t)) represent the maximum value and the minimum value of the original data block, respectively; ρ represents an adjustment coefficient; the calculation result of the adjustment data block size S z can be controlled by the adjustment of the adjustment coefficient, so as to adjust the data retention time R.
Specifically, ρ represents an adjustment coefficient whose calculation formula is: ρ=k×log (Σ (s (t+1))/Σ (s (t)));
wherein k is a constant coefficient; Σ (s (t+1)) represents the total amount of changed data blocks; Σ (s (t)) represents the original data block total amount.
In particular, the invention also provides a data backup system based on the super fusion mechanism, which comprises: the system comprises a management platform, a backup server, a storage server and an agent end;
The management platform is used for uniformly scheduling system tasks; the backup server is used for managing backup strategies in the management platform; the storage server is used for storing backup data; the agent end is managed by the backup server and is used for executing backup operation; the agent end adopts a variable-length block technology to process backup data, and specifically comprises the following steps:
determining the characteristics of the initial variable-length data blocks according to the file characteristic function; when detecting the data change, re-carrying out variable length block segmentation according to the original data block boundary; and constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication.
In particular, the present invention also proposes a computer-readable storage medium storing a computer program that causes a computer to execute the data backup method based on the super fusion mechanism as described.
The beneficial effects are that:
the super fusion backup system is simpler and more efficient to manage and maintain;
By the modularized design of the invention, seamless expansion can be realized by adding nodes, and the synchronous service requirement of resource expansion can be realized;
through the unified management platform, the resource repetition and waste are reduced, and the redundancy is reduced;
The built-in fault detection and disaster recovery technology of the invention improves the fault detection and recovery capability of the backup system;
the resource allocation can be continuously optimized by the software definition and optimization method, and the high-efficiency operation is ensured;
the data security is improved through the advanced security mechanism supported by the invention, such as encryption, access control and the like;
The invention simplifies deployment and management, and generally reduces hardware cost and use cost;
through the design of the self-defined backup strategy, different backup schemes can be implemented according to the needs;
the variable-length quick deleting and source-end data deleting technology reduces the actual backup data quantity and improves the backup speed;
By the metadata verification mechanism, correct backup data can be read, and data consistency is improved.
Drawings
FIG. 1 is a schematic diagram of a data backup and restore system in accordance with the present invention;
FIG. 2 is a system configuration and management architecture according to the present invention;
FIG. 3 is a diagram of a storage backup system architecture with a storage management node or super-converged client as a data acquisition mode;
fig. 4 is a schematic diagram of a precise and effective data backup architecture according to the present invention.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
The invention provides a data backup method based on a super fusion mechanism, which comprises the following steps:
Step 1, a Backup system comprising a management platform, a Backup Server, a Storage Server and an Agent end is constructed, as shown in fig. 1, and the system is constructed and managed as shown in fig. 1, wherein the data Backup and recovery system is jointly constructed by components such as the Backup Server (Backup Server), the Storage Server (Storage Server), the Agent end (Agent) and the like according to the architecture design.
The backup server manages the data of all agent ends and the storage server; in the management platform, the system user can perform configuration operation of backup operation on the client host computer operable by the current user, including addition, modification, deletion, execution and the like.
Newly adding operation: different types of backup jobs are created, including types of backup jobs of files, databases, operating systems, applications, virtual machines, and the like. The newly added backup operation needs to configure host information, backup content, backup targets, backup strategies and other related information.
And (3) job modification: modification of relevant information of the backup job is supported.
Deleting operation: and supporting the deleting operation of the backup operation.
The user configures the operation information on the backup server through the WEB management interface of the management platform, and the backup server sends the operation instruction to the proxy end.
The agent end executes backup/recovery operation and directly stores the data into the storage server;
the agent end executes data synchronization operation, and the synchronization processing of the data in the database can be realized between the client ends;
and the storage servers realize backup data copying among the storage pools on the same storage server or different storage servers according to the pool copying instruction of the backup servers.
The backup data, the data such as the catalyst and the like are supported to be backed up to the remote storage server at regular intervals, and the remote disaster recovery of the backup data is realized.
The backup server needs to support deployment to the autonomous controllable architecture server equipment to construct an autonomous controllable backup server, provide a backup management platform, manage access of a backup agent end and a storage server, uniformly monitor and manage service information such as backup, recovery, high availability of data and the like of resources of each client, and store relevant information of backup data. The backup server is a core module of the data backup management system, and all system tasks and user operations are scheduled and executed by the backup server in a unified way, including job scheduling issuing, media read-write management and the like.
The storage server is deployed on the autonomous controllable architecture server equipment to be constructed into a data storage server and is responsible for receiving and storing backup data, so that the storage of a service system and the data is realized, and the backup modes of complete backup, incremental backup, differential backup, synthetic backup and the like of different types of data such as unstructured data, databases and the like are realized.
The client agent end is deployed on the client server and used for integrating the backup resources of the client end so that the backup server can perform unified operation management after being connected to the backup server.
The agent end is supported to be deployed in different environments such as a physical machine and a virtual machine, receives the job scheduling of the backup server, is directly connected with the storage server, executes the backup or recovery task, reduces the occupancy rate of data transmission to bandwidth by collecting, de-duplicating, compressing, limiting the speed and the like of the backup data, and improves the backup efficiency of the data.
The management platform is used for uniformly scheduling system tasks; the backup server is used for managing backup strategies in the management platform; the storage server is used for storing backup data; the agent end is managed by the backup server and is used for executing backup operation; the agent end adopts a variable-length block technology to process backup data, and specifically comprises the following steps:
determining the characteristics of the initial variable-length data blocks according to the file characteristic function; when detecting the data change, re-carrying out variable length block segmentation according to the original data block boundary; and constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication.
The fingerprint is the unique identification information of each data block, whether a fixed block segmentation technology or a variable length block segmentation technology is adopted, and after the data is segmented, the system establishes fingerprint information for each data block by utilizing a hash algorithm which performs performance optimization on an assembly instruction through vector calculation.
And 2, storing the fingerprint database of the data block in a backup server, and when the data deleting process is required, sending the fingerprint database from the backup server to a client, and comparing the fingerprints of the data block at the client to realize the source deleting process of the backup data.
In the configuration of backup operation, when backup data needs to be stored in a variable-length storage pool, a variable-length technology is needed to perform source-end duplication removal processing on the backup data.
In the variable length block division technology, in the initial division processing of data, the size of a divided data block of each file data needs to be determined, and the data blocks of different files may be different, so that the sizes of the data blocks in the same file are the same.
For different data files, the characteristic functions will be included according to: file size, disk distribution, etc., a block size between 64K and 256K is dynamically selected to partition data, i.e., the partitioned block sizes may be different between different data files.
For each file data, only the same block size is used for data segmentation.
When the variable length block dividing process is performed after the data change, whether new, modified or deleted, is performed, and the data block is divided again mainly according to the data block boundary. For the same file data, the data is divided by a continuously sliding window based on the same block size, and the data block boundaries are determined.
When data change occurs in a certain file, and data is to be inserted into or deleted from a data object, the data re-segmentation processing is performed according to different conditions:
if the changed content is not within the boundaries of the data block, the data block is unchanged.
When the newly added content occurs in the data block boundary, the changed data block is required to be re-divided according to the original block size due to the lengthening of the data, one data block is divided into a plurality of data blocks, and each data block forms a new data boundary.
When the changed content occurs in the sliding window and the boundary data block is damaged, the two data blocks are combined into one data block, or the boundary between the two data blocks is changed to generate a new data block.
Therefore, when content is inserted or deleted, only one or two adjacent data blocks are affected, and the rest data blocks are not affected, so that the data de-duplication is more accurate.
Optionally, the service data is classified into a core, an important level and a general level, different strategies are issued according to the actual demands and index requirements of users, and the backup strategy of a general service system is formulated and practiced as follows:
incremental backups of application data (databases, etc.) and user data are made daily, and full backups of data are made every Saturday.
The data part of the database system is backed up every day, the incremental backup is carried out on the database from Monday to friday, and the full backup is carried out on the database from Saturday; it is recommended to make one or more LOG backups between incremental backups. Thus, the recovery can be quickly and effectively carried out when the fault occurs.
The daily data backup is executed according to a certain backup strategy, data (such as one week) is reserved for a period of time, and after the period of time passes, the data is overwritten, and the disk space is recycled. All backup operations can be automatically performed without human intervention.
The weekly system full backup adopts other virtual tapes, defines different backup strategies and is carried out independently without intervention with daily backup.
Backup of the operating system and application software may be done once a month.
During the backup process, a large amount of space is needed, and the backup data on the medium can be reduced more and less with the lapse of time, and the latest backup data can be used for recovery unless the state is restored to a certain historical state. Therefore, when the backup strategy is formulated, the longest effective period of the data and the tolerable data loss time are determined according to the operation and the use condition of the data, so that the time for executing the backup, the type of each backup, the method for using the empty medium and the method for reusing the old medium are determined. For the backup of the database server, a backup strategy should be formulated from the aspects of integration, system and long term, so that the daily backup can be rapidly carried out, and the data in a certain historical period can be completely reserved.
After the backup strategy is defined for each group of data and database according to the need, the system automatically backs up the data to be backed up to the appointed tape library according to the defined time and mode.
The backup modes can be divided into three types: full backup, incremental backup, differential backup.
Full backup is to backup all defined data each time, and has the advantages of quick recovery, large backup data volume and long time for one full backup when the data is more.
The incremental backup is to backup all data updated since the last backup, and has the advantages of small data volume of each backup and the disadvantage of needing full backup and multiple incremental backups during recovery.
Differential backup is the backup of all data updated since the last full backup.
The method can be flexibly applied by combining the three modes. Such as:
1) When the data volume is small, the data can be backed up in a full backup mode every time, so that only one data source needs to be designated during recovery.
2) When the data volume is large, the efficiency is low if full backup is performed every day. Full and incremental backup modes may be combined. Such as a full backup every week (e.g., sunday) and an incremental backup every day (e.g., monday through wednesday) at other times. When recovering, only at most seven backup media are recovered in turn. ( Such as: last sunday, monday, tuesday. . . Until the date of the previous day. )
3) When the data volume is particularly large, the pressure on the system to make full backups per week can be significant. At this time, three modes of full backup, accumulated incremental backup and incremental backup can be combined, so that the method has relatively high efficiency and quick backup means. For example, a full backup is performed every month (e.g., at the beginning of each month), then an incremental backup is performed every sunday, and at other times, an incremental backup is performed every day. During restoration, full backup at the beginning of month is restored first, then accumulated incremental backup of the last sunday is restored, and then incremental backups of each day are sequentially restored later, such as Monday and Tuesday. . . Until the date of the previous day. (8 copies of data are recovered at most, and if an accumulated incremental backup mode is not adopted, 31 copies of data may need to be recovered at most during recovery, and the recovery speed and the complexity are not ideal).
4) Each backup software strategy has a set of calendaring programs that can control its backup and profiling. These schedules are included in the policy definition, and each schedule of a policy will affect the entire client and file list of the policy.
Optionally, based on the incremental data reading of metadata, for the CDP backup technology that uses asynchronous mode to perform incremental data reading and sending, the backup technology first constructs a monitoring record and submits the monitoring record to a local disk, and then extracts the monitoring record from the disk and sends the monitoring record to a backup server for backup. Multiple IO writes may occur at the same location on the disk before being synchronized to the backup server, thereby causing the incremental data to be read or written to be in an inconsistent state with the actual incremental data in the disk. This may lead to errors in the reading of the incremental data. A method of reading incremental data has been studied to solve the above-described problems, thereby providing reading accuracy of the incremental data.
The new technology for obtaining the reading method of the incremental data comprises the following steps: acquiring incremental metadata; the incremental metadata is used for describing the incremental data; acquiring a writing state mark of the increment metadata; the writing state mark is used for representing whether the corresponding incremental data is successfully written into the magnetic disk or not; and when the writing state mark is a writing completion mark, corresponding incremental data is read from the magnetic disk according to the incremental metadata.
Optionally, the backup policy further includes: and carrying out a combined limiting strategy on the retention time R of the backup data and the size of the database in the backup. Wherein, the backup data retention time R may be adjusted according to the calculated change of the adjustment data block size S z:
If the size S z of the data block is reduced, so that the retention time R of the backup data can be shortened, and the backup data can be cleaned in time;
if the size S z of the data block is increased, R can be properly prolonged, and more backup histories are reserved;
The backup data retention time R may specifically be set as: r=r0 (1+k s z)
Wherein R0 is a preset basic backup data retention time, k is an adjustment coefficient, and S z is an adjustment data block size.
Wherein, in the calculation of the adjustment data block size S z: let the original data block size be s (t), the changed data block size be s (t+1), the number of data blocks be N, the calculation formula of the new data block size after the variable length block segmentation is:
S z represents the adjustment data block size; s (t+1) -s (t) | represents the absolute value of the amount of change in the adjacent data block; max (s (t)), min (s (t)) represent the maximum value and the minimum value of the original data block, respectively; ρ represents an adjustment coefficient; the calculation result of the adjustment data block size S z can be controlled by the adjustment of the adjustment coefficient, so as to adjust the data retention time R.
Ρ represents an adjustment coefficient whose calculation formula is: ρ=k×log (Σ (s (t+1))/Σ (s (t)));
wherein k is a constant coefficient; Σ (s (t+1)) represents the total amount of changed data blocks; Σ (s (t)) represents the original data block total amount.
In particular, the structure of the storage backup system using the stored management node or the super-fusion client as the data acquisition mode is shown in fig. 3, and the storage backup method is to perform file-level backup on the data in storage by using the stored management node or the super-fusion client as the data acquisition mode.
The invention provides a precise effective data backup framework, as shown in figure 4, which is characterized in that Agent agents corresponding to different types of data are installed on a server to be backed up according to the different types of data to be backed up, different backup strategies are formulated according to different service grades and data importance, a periodic backup mode is adopted to flexibly and effectively complete the backup of the effective data, backup data copies are reserved according to requirements, and the backup data copies can also complete the automatic cleaning of storage space according to the formulated copy reservation conditions so as to save the space of the storage medium and reduce the use cost.
In particular, the system of the invention also comprises the following modules:
policy configuration management includes:
Resource allocation management module
The management platform provides unified management of resource information, and information such as a backup source, backup equipment, a backup database and the like of the system is configured into the management platform, and a platform manager can manage related resource information, including new addition, modification, deletion, viewing and the like of resources.
The resource configuration includes configuration of information such as backup sources, backup devices, and the like.
Backup source configuration module
The backup source is a client host needing backup and related content in the host, in the management platform, backup source information in the system can be configured in the platform, a system user with authority can check the client host operable by the current user and resources under the host, and the resources under the host comprise files, databases, operating systems and other backup resources.
The management platform provides for viewing related information of the client host resources and supports modification of the registration names of the client hosts.
Backup device configuration module
Backup server: information management for the backup server is provided.
And (3) a storage pool: the storage pool is a storage provided to the client host for backup of the backup set based on different capacity storage space configured on the storage media server.
The types of storage pools include: magnetic disk storage pools, tape storage pools, etc.
Job configuration management module:
backup job management
In the management platform, the system user can perform configuration operation of backup operation on the client host computer operable by the current user, including addition, modification, deletion, execution and the like.
Newly adding operation: different types of backup jobs are created, including types of backup jobs of files, databases, operating systems, applications, virtual machines, and the like. The newly added backup operation needs to configure host information, backup content, backup targets, backup strategies and other related information.
And (3) job modification: modification of relevant information of the backup job is supported.
Deleting operation: and supporting the deleting operation of the backup operation.
And (3) recovering the job management module:
in the management platform, a system user can perform configuration of a recovery job on a client host which can be operated by the current user.
The information that the newly added recovery job needs to be configured comprises: host information, backup sets, restore targets, restore plans, and related restore configuration content.
Report statistics and analysis module
The management platform provides a report statistics function and supports statistics of information such as storage capacity, operation conditions, backup statistics and recovery statistics. In order to meet different requirements of users on report statistics, the users can perform self-definition on report establishment.
The system provides a multi-tenant management function, a report presentation and statistical charging function of multi-dimension multi-product multi-service, and a tenant self-defined report function.
Resource full view module
After the tenant or the user logs in the platform, the monitoring full view of all resources in the whole management can be checked according to the authority of the current user, and the full view shows the basic information of the system operation to the user, so that the tenant or the user can master the basic operation condition of the system in real time. The display module mainly comprises a host, resources, a storage pool, the number of jobs and states, and simultaneously displays the use condition of the storage device space, the system version number and the like.
Report form display module
According to user-defined data statistics of the user, relevant statistical data which the user needs to know can be displayed, wherein the relevant statistical data comprises:
generating a statistical report and chart display according to the information such as the storage equipment, the running state of the client, the job completion condition and the like;
the information such as backup recovery operation, backup equipment and medium use in a period of time is counted, and a statistical report and chart display are generated;
And supporting statistics information of macroscopic data such as storage capacity, backup rate and the like, and providing data inquiry.
Data analysis module
The method and the system provide multi-dimensional statistical analysis and trend analysis of backup period, data types and the like for backup historical data, and users can clearly know the capacity expansion requirements of related resources according to the statistical analysis of the data.
User authority management module
In the same independent system management domain, a multi-tenant management function is provided for the business system and the backup system of each client. Therefore, on the basis of a management platform of a common usage system of each client office, different resources and monitoring management interfaces are used by each user through a multi-tenant technology.
Multi-tenant isolation
Through different data management means, the data of multiple tenants can be isolated in different modes, and through a good data isolation method, the maintenance cost (including equipment and manpower) of a management platform can be reduced.
Statistics and billing
Through multi-tenant management and resource use conditions, the system provides the user with the function of counting and charging the resources used by the system, and the statistics dimension and charging basis includes but is not limited to: the backup storage usage amount, the number of backup clients, the backup data storage time, the backup occupied bandwidth and the like;
operation and maintenance audit management module
Audit management
The platform establishes a log audit system to audit the operation behaviors of all users, including backup strategy configuration, backup strategy updating, operation recovery configuration, user information and time information for executing the operations, and the like, and after the log information is audited, the log can only provide viewing, cannot carry out operations such as modification, deletion and the like, so that the log information is prevented from being tampered.
And the log audit management plays a role in preventing illegal operations or identifying dangerous operations. Meanwhile, when a problem occurs, the problem source can be traced according to the log content.
Client host upgrades
The connected and online clients can be upgraded in a centralized way through the management platform, and each client host does not need to be upgraded independently.
When the user with upgrading operation authority upgrades the host of the client, the host name, state, IP address, operating system information and current version of the currently connected client can be checked.
The user selects an upgrade package to be upgraded, the platform automatically uploads the upgrade package to the client host, and the client host executes automatic upgrade by default within a set time after the upgrade package is uploaded.
For the uploaded upgrade package, the platform provides a checking function, and can check the related information of the uploaded upgrade package, including the name, the modification time and the size of the upgrade package.
System management module
User management
The user management module mainly provides management and maintenance for user information in the platform, the platform supports creation of multiple users, and when a new user is created, a user name, a password service life, mailbox information and a contact phone are input according to requirements. For user management, modification and deletion of user information are also included.
Role and rights management
In the management of the user, different user roles can be set and corresponding rights can be allocated, so that users with different roles are ensured to only have corresponding module access and operation rights. The users can be divided into two types of system users and audit users, and the two types of users respectively comprise the following user roles:
The system user: including system administrators, tenants/users, and monitors.
Auditing users: including audit administrators and auditors.
The following role correspondence table examples:
The backup system operators, backup system leaders or planners, backup system administrators and other personnel planned by the user side are distributed to corresponding roles according to different operation and system requirements, different roles have different system operation authorities, and after the user logs in the platform, the platform presents corresponding functions and information on the interface according to the authorities of the roles to which the user belongs.
Log management module
The log information includes two parts: job execution log, user operation log.
Job execution log
For the execution of the backup service, log information will be generated, and the system user with authority will be able to download the log information, including: proxy logs, backup server logs, storage server logs, etc.
User operation log
The system has detailed records for each step operation of the login user, including detailed records of checking, adding, deleting and modifying data records, and log records are provided for each inquiry of equipment information and downloading of backup files. The log records not only the operation behavior but also information such as the time of the operator's operation.
Flow management module
Configuration approval management
The management platform provides unified management of resource information, information such as a backup source, backup equipment, a backup database and the like of the system is configured into the management platform, and a platform manager can manage related resource information, including new addition, modification, deletion, viewing and the like of resources and provides a flow approval function.
Backup source configuration
The system user performs updating operations such as renaming of the client host, and the like, and firstly needs to be provided for a platform manager for auditing, and only after the auditing is passed, the system user can take effect.
Backup device configuration
The backup device comprises a backup server and a storage medium server, the update operation of the related information of the device is supported in the management platform, and the update content is enabled after the verification of a system administrator is passed.
Backup server
Information management for the backup server is provided.
The backup server information comprises SMTP, server time and network card information, the user can update the related information according to the authority condition, and the updated information is submitted to a system administrator for approval.
Storage pool
For the configured storage pool, the user can provide modification operation for the information such as the name, capacity quota, backup set preservation time and the like of the storage pool, and the modified information is submitted to a system administrator for examination.
Job configuration management
The management platform provides configuration operations for backup operation and recovery operation, including new operation, modification operation and deletion operation, the operation configuration operation needs to be checked and approved by a platform manager, and only information passing the check and approval can be effectively used.
Operation approval
In the management platform, after the user performs operation configuration operation, all the operations need to be submitted to a system administrator for verification, and only the operation passing the verification can be validated and executed.
The information approval management is established in the management platform, and a system administrator of the platform can approve the data configured by the platform user, and only the approved information can be validated and starts to be executed.
In the approved management page, the administrator can look up information about to be approved, approved and the like, and the information comprises: the approval content, the submitter, the application time and other information, and the approval passing time is also increased in the approved information records.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
It will be evident to those skilled in the art that the embodiments of the invention are not limited to the details of the foregoing illustrative embodiments, and that the embodiments of the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of embodiments being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units, modules or means recited in a system, means or terminal claim may also be implemented by means of software or hardware by means of one and the same unit, module or means. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the embodiment of the present invention, and not for limiting, and although the embodiment of the present invention has been described in detail with reference to the above-mentioned preferred embodiments, it should be understood by those skilled in the art that modifications and equivalent substitutions can be made to the technical solution of the embodiment of the present invention without departing from the spirit and scope of the technical solution of the embodiment of the present invention.

Claims (10)

1. A data backup method based on a super fusion mechanism, the method comprising:
Constructing a backup system comprising a management platform, a backup server, a storage server and a proxy end, wherein,
The management platform is used for uniformly scheduling system tasks; the backup server is used for managing backup strategies in the management platform; the storage server is used for storing backup data; the agent end is managed by the backup server and is used for executing backup operation; the agent end adopts a variable-length block technology to process backup data, and specifically comprises the following steps:
determining the characteristics of the initial variable-length data blocks according to the file characteristic function; when detecting the data change, re-carrying out variable length block segmentation according to the original data block boundary; and constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication.
2. The data backup method based on the super fusion mechanism according to claim 1, wherein the agent side further comprises, when executing incremental backup: acquiring metadata corresponding to the incremental data; the metadata includes information describing delta data; acquiring a writing state mark of the metadata, wherein the writing state mark represents whether incremental data is successfully written into a memory; and when the writing state mark confirms that the incremental data writing is successful, the incremental data is read according to the metadata.
3. The data backup method based on the super fusion mechanism as claimed in claim 1, wherein,
Determining the characteristics of the initial variable-length data block according to the file characteristic function, wherein the method specifically comprises the following steps: establishing fingerprint information for each data block according to a hash (hash) algorithm realized by an assembly instruction of vector calculation;
Constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication, wherein the method specifically comprises the following steps:
constructing a fingerprint library for storing fingerprints with variable length blocks at a backup server side; the fingerprint library is sent to the proxy end, and fingerprint comparison is carried out by loading on the proxy end; and judging whether the variable length blocks are repeated or not through fingerprint comparison so as to selectively skip repeated data blocks.
4. The data backup method based on the super fusion mechanism as claimed in claim 3, wherein when detecting the data change, re-performing the variable length block segmentation according to the original data block boundary, specifically comprising: when the changed data is not at the boundary of the data block, the original data block is kept unchanged; when the newly added data is positioned at the boundary of the data block, the original data block is split into a plurality of data blocks; when the change data is within the sliding window, the original neighboring data blocks merge or adjust the boundary.
5. The data backup method based on the super fusion mechanism according to claim 1, wherein the method further comprises: determining different backup strategies according to the importance of the data, wherein the backup strategies comprise:
backup mode, backup period and retention time R of backup data.
6. The data backup method based on the super fusion mechanism as claimed in claim 5, wherein,
The backup data retention time R may be adjusted according to the calculated change in the adjusted data block size S z:
If the size S z of the data block is reduced, so that the retention time R of the backup data can be shortened, and the backup data can be cleaned in time;
if the size S z of the data block is increased, R can be properly prolonged, and more backup histories are reserved;
The backup data retention time R may specifically be set as: r=r0 (1+k s z)
Wherein R0 is a preset basic backup data retention time, k is an adjustment coefficient, and S z is an adjustment data block size.
7. The method for data backup based on a super fusion mechanism as defined in claim 6, wherein,
In the calculation of the adjustment data block size S z: let the original data block size be s (t), the changed data block size be s (t+1), the number of data blocks be N, the calculation formula of the new data block size after the variable length block segmentation is:
S z represents the adjustment data block size; s (t+1) -s (t) | represents the absolute value of the amount of change in the adjacent data block; max (s (t)), min (s (t)) represent the maximum value and the minimum value of the original data block, respectively; ρ represents an adjustment coefficient; the calculation result of the adjustment data block size S z can be controlled by the adjustment of the adjustment coefficient, so as to adjust the data retention time R.
8. The method for data backup based on a super fusion mechanism as defined in claim 7, wherein,
Ρ represents an adjustment coefficient whose calculation formula is: ρ=k×log (Σ (s (t+1))/Σ (s (t)));
wherein k is a constant coefficient; Σ (s (t+1)) represents the total amount of changed data blocks; Σ (s (t)) represents the original data block total amount.
9. A data backup system based on a super fusion mechanism, the system comprising: the system comprises a management platform, a backup server, a storage server and an agent end;
The management platform is used for uniformly scheduling system tasks; the backup server is used for managing backup strategies in the management platform; the storage server is used for storing backup data; the agent end is managed by the backup server and is used for executing backup operation; the agent end adopts a variable-length block technology to process backup data, and specifically comprises the following steps:
determining the characteristics of the initial variable-length data blocks according to the file characteristic function; when detecting the data change, re-carrying out variable length block segmentation according to the original data block boundary; and constructing a fingerprint library of the variable-length block, and comparing fingerprints at the proxy end to realize data deduplication.
10. A computer-readable storage medium storing a computer program for causing a computer to execute the data backup method based on the super fusion mechanism as claimed in any one of claims 1 to 8.
CN202410502053.7A 2023-10-27 2024-04-25 Data backup method, device and medium based on super fusion mechanism Pending CN118312359A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202322894221 2023-10-27
CN2023228942215 2023-10-27

Publications (1)

Publication Number Publication Date
CN118312359A true CN118312359A (en) 2024-07-09

Family

ID=91727431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410502053.7A Pending CN118312359A (en) 2023-10-27 2024-04-25 Data backup method, device and medium based on super fusion mechanism

Country Status (1)

Country Link
CN (1) CN118312359A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118689711A (en) * 2024-08-27 2024-09-24 杭州泛海科技有限公司 A PLC system data integration management method, medium and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118689711A (en) * 2024-08-27 2024-09-24 杭州泛海科技有限公司 A PLC system data integration management method, medium and device
CN118689711B (en) * 2024-08-27 2025-01-28 杭州泛海科技有限公司 A PLC system data integration management method, medium and device

Similar Documents

Publication Publication Date Title
Kaczmarski et al. Beyond backup toward storage management
KR101150127B1 (en) Method, system and apparatus for creating an archive routine for protecting data in a data protection system
US8090917B2 (en) Managing storage and migration of backup data
CN101460931B (en) Retaining shadow copy data during replication
US9003374B2 (en) Systems and methods for continuous data replication
US7949512B2 (en) Systems and methods for performing virtual storage operations
CN102959518B (en) The method and system that file system recovery performs to the computing machine of target memory
EP1537496B1 (en) Data protection system and method
US7350043B2 (en) Continuous data protection of block-level volumes
JP3957278B2 (en) File transfer method and system
US7334098B1 (en) Producing a mass storage backup using a log of write commands and time information
US9697571B2 (en) Real-time file system charge-back accounting per management object during a report cycle
CN102713856B (en) Method and system for recovering file system of computer system
CN101311911B (en) Staging memory system and data migration method thereof
EP1796002A2 (en) Method and apparatus for efficiently storing and managing historical versions and replicas of computer data files
EP1959346A2 (en) Methods and apparatus for adjusting a journal area for continuous data protection
CN103605585A (en) Intelligent backup method based on data discovery
US10809922B2 (en) Providing data protection to destination storage objects on remote arrays in response to assignment of data protection to corresponding source storage objects on local arrays
EP1698977B1 (en) Storage system and method for acquisition and utilisation of snapshots
KR20120098708A (en) Datacenter workflow automation scenarios using virtual databases
US8301602B1 (en) Detection of inconsistencies in a file system
US7774315B1 (en) Backup system
KR20120093296A (en) Virtual database system
US20110016088A1 (en) System and method for performance and capacity monitoring of a reduced redundancy data storage system
US20200133527A1 (en) Versioning a configuration of data storage equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination