CN102244685B

CN102244685B - Distributed type dynamic cache expanding method and system for supporting load balancing

Info

Publication number: CN102244685B
Application number: CN201110230333XA
Authority: CN
Inventors: 黄涛; 秦秀磊; 张文博; 魏峻; 钟华; 朱鑫
Original assignee: Institute of Software of CAS
Current assignee: Jinan Jun'an Tai Investment Group Co Ltd
Priority date: 2011-08-11
Filing date: 2011-08-11
Publication date: 2013-09-18
Anticipated expiration: 2031-08-11
Also published as: CN102244685A

Abstract

The invention discloses a distributed cache dynamic scaling method and system supporting load balancing, belonging to the technical field of software. The method is as follows: 1) each cache server regularly monitors its resource utilization rate; 2) each cache server calculates its own weighted load value L _i according to the currently monitored resource utilization rate, and sends it to the cache cluster manager; 3) The cache cluster manager calculates the current average load value of the distributed cache system according to the weighted load value L _i

when

When it is higher than the threshold thre _max , the expansion operation is performed; when

When it is lower than the set threshold thre _min , the contraction operation is performed. The system includes a cache server, a cache client and a cache cluster manager; the cache server is respectively connected to the cache client and the cache cluster manager through a network. The invention guarantees the balanced distribution of network traffic among each cache node, optimizes the utilization rate of system resources, and solves the problems of guaranteeing data consistency and continuous availability of services.

Description

A kind of distributed caching dynamic retractility method and system of holding load equilibrium

Technical field

The present invention relates to a kind of distributed caching dynamic retractility method and system thereof, relate in particular to a kind of distributed caching dynamic retractility method and system of holding load equilibrium, belong to software technology field.

Background technology

Under the cloud computing environment, explosive growth has appearred in number of users and network traffics, how on cheap, standardized hardware and software platform, big capacity, professional transaction of closing of bonding is used and is provided good supporting to become the problem that numerous enterprises faces.Usually the service end bottleneck can appear at database, and in order further to address this problem, the distributed caching technology is introduced.Distributed caching furthered the cluster object data and use between distance, be the expedited data visit, provide data distributed shared key technology, this technology has important effect for the extended capability, the safeguards system reliability that improve system.

Forrester points out that in technical report in 2010 scalability is one of key characteristic of distributed caching system.The present invention is with flexible two classes that are divided into of caching system: static flexible and dynamic retractility.Static stretching when referring to system's increase and decrease cache node need stop the system of current operation, upgrades configuration information according to the situation of node increase and decrease, restarts whole system then.Dynamic retractility then is a kind of online flexible, and system can finish the interpolation of cache node and deletion, data cached migration and the renewal of configuration information automatically according to the variation of load, is self-management and the self that carries out under the artificial situation about participating in not having.The tradition caching system is main purpose to improve data access speed, is applied to page cache mostly, handles buffer memory and data object buffer memory.The current cache system also is used for the store status data except accelerating application access, high available support is provided, and therefore the lasting availability of data consistency and service must be protected when flexible.The existing distributed caching system is scarcely supported dynamic retractility, as OSCache, and Memcached, Terracotta EX etc., system need restart when scale is expanded, and causes service disruption and loss of data, can't adapt to demands of applications.

But there is the challenge of following two aspects in the distributed caching system that realizes a dynamic retractility: 1) during the buffer memory dynamic retractility, the partial buffering data need be finished migration between node.The buffer memory service disruption causes asking failing or the losing of important business data, status data in the transition process, a plurality of copies of multi-user concurrent operating data and the inconsistent grade of data that produces all can be brought the interests loss to the user.In addition, the data migration also will be considered migration overhead and current system loading conditions, rationally controls the migration progress, avoids causing system overload so that the service collapse because expense is excessive.Therefore, how to ensure that lasting availability and the data consistency of buffer memory service becomes a major challenge in the telescopic process.2) can have some hot spot datas district in the caching system, namely visit the higher relatively zone of load, characteristics and the user access pattern of application often depended in the position in hot spot data district.The hot spot data district very easily becomes system bottleneck, and then the availability of buffer memory service is brought influence, so needs in the telescopic process to consider that hot spot data is carried out rebalancing method to be handled.

In the distributed caching system, the data balancing strategy on each caching server node, when obtaining data is retrieved the data distributed store from the node that calculates, avoided the formula of flooding to search, and has improved the efficient of data search.Early stage popular strategy is the remainder algorithm, and this algorithm is realized simple, and its core concept is that the remainder according to the server node number carries out data and distributes.The deficiency of remainder algorithm is when server node changes (when for example node failure or node are flexible), acute variation can take place in the server node that key assignments is mapped to, have only a spot of data still in former server node visit, other data have then been transferred to new server node, and hit rate declines to a great extent.

In order to address the above problem, scholar (the D.Karger of the Massachusetts Institute of Technology, E.Lehman, T.Leighton, R.Panigrahy, M.Levine, and D.Lewin.Consistent hashing and random trees:distributed caching protocols for relieving hot spots on the World Wide Web.In Proceedings of the Twenty-Ninth Annual ACM Symposium on theory of Computing.STOC ' 97.pp.654-663.1997.) has proposed everybody present known consistency hash algorithm (Consistent hashing) in 97 years, this algorithm is at present still at application layer multicast, fields such as P2P and buffer memory platform are extensive use of.Its basic thought is that the output area of hash function is defined as a fixing annular space, is referred to as the Hash ring, and each server node shines upon mutually with the value that ring is gone up a certain random site representative.Each data cached key assignments also is mapped to position on the Hash ring, advances clockwise on ring then, and when the position of finding first to represent certain server node, this store data items is in this caching server.The consistency hash algorithm has solved data partition and routing issue preferably, has reduced node to greatest extent and has changed the influence that data are distributed and cause.Be expressed as Fig. 1.Document (D.Hastorun, M.Jampani, G.Kakulapati, A.Pilchin, S.Sivasubramanian, P.Vosshall, and W.Vogels, " Dynamo:Amazon ' s highly available key-value store; " In Proceedings of ACM Symposium on Operating Systems Principles (SOSP ' 07) .pp.205-220,2007.) the consistency hash algorithm is conducted in-depth analysis, think that there are following 2 deficiencies in it: 1) service node random distribution on the Hash ring causes the uneven distribution of data and load; 2) algorithm has been ignored the performance difference of service node.In actual scene, the focus subregion can't be eliminated by improving the distributed hash algorithm fully to the influence of system.

People such as the Chiu of Washington State University (D.Chiu, A.Shetty, G.Agrawal.Elastic Cloud Caches for Accelerating Service-Oriented Computations.In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC ' 10) .pp.1-11,2010.) study at the dynamic retractility problem of caching system, and a kind of data migration strategy based on greedy method has been proposed.When user's request is inserted data to cache node n, if this node free memory inadequate resource, meeting trigger data migration event, migration algorithm is moved to other nodes with the partial data of node n.The lightest cache node of present load is paid the utmost attention in choosing of destination node.If migration may cause target node memory to overflow, then distribute a new node.When migration is finished, upgrade the Hash mapping table synchronously.Choose two the lightest cache nodes of present load when node shrinks and carry out the data merging.The deficiency of above-mentioned research work mainly contains 2 points: the one, and the node telescopic process is not considered the influence that heterogeneous nodes, focus regional addressing bring, can't dynamic equalization focus subregion.The 2nd, the data consistency guarantee is not provided, the data migration operation may be subjected to the influence of network factors to cause operation failure or loss of data, and client Hash mapping table information inconsistency may cause certain customers to obtain a plurality of copies of object requests failure or concurrent operations data and produce problem of inconsistency etc.

The research work of the Flavio of Eidgenoess Tech Hochschule (ETH http://systems.ethz.pubzone.org/servlet/Attachment? attachmentId=109﹠amp; VersionId=1378371) be primarily aimed at the dynamic retractility problem of cloud storage system, this work is based on the method on-line monitoring focus subregion of statistics, and the partial data with the focus subregion migrates to the lighter neighbor node of load simultaneously.In order to ensure that migration finishes smoothly, the hash algorithm of its use can keep the sequencing between the Key.Deficiency is that load-balancing decision finishes in each cache node this locality, lacks global information, and system often needs just can enter stable equilibrium state through iteration repeatedly like this, and convergence rate is slow, and the expense of introducing is bigger.

Summary of the invention

The objective of the invention is to overcome the problem that exists in the existing scheme, a kind of distributed caching dynamic retractility method and system of holding load equilibrium are provided.How the resource utilization of each cache node in the inventive method on-line monitoring system stretches based on the weighting load value that calculates and system's average load value decision system.Data balancing problem during at dynamic retractility, the inventive method consider that hot spot data to the influence that system availability causes, has proposed a kind of load-balancing method that is applicable to isomerous environment.Data consistency security problem during at dynamic retractility, the present invention has realized a kind of Data Access Protocol based on three phase requests, simultaneously in order to eliminate the influence that the data migration causes system availability as much as possible, the present invention adopts a kind of controlled data migration method effectively to control the migration progress, reduces migration overhead.

To the caching system dynamic retractility time, the inventive method relates generally to following three kinds of fundamental mechanisms: data route, data balancing and information synchronization mechanism describe in detail this three partial content below:

(1) routing mechanism of distributed caching can be divided into initiatively initiatively three kinds of (Server-driven) and load equalizer routes of (Client-driven), server of client.The inventive method adopts the initiatively mechanism of route of client, i.e. client maintenance routing table, and will ask directly to be forwarded to destination node according to routing table information.Than other two kinds of mechanism, there is following advantage in this mechanism: data are routed to destination node only needs single-hop, has effectively reduced the network overhead that the multi-hop routing forwarding is brought; Caching server need not be born route forwarding function, is conducive to the raising of its performance; Response time is short.

(2) the data balancing mechanism of the present invention's employing has increased the support to heterogeneous nodes, and each cache node i is endowed an initial weight w according to its disposal ability _iWith whole Hash space be divided into some Q piece of data subregions that wait size (Q＞＞K, Q＞＞N, K is the backup number, N is buffer memory service node number), cache node i is mapped to T in the following manner _iOn the individual subregion:

T_{i} = w_{i} * \frac{Q}{Σ_{i = 1}^{N} w_{i}} - - - (1)

The mapping relations of subregion and cache node are kept in the partitioned server mapping table (i of mapping table capable preserved i subregion corresponding cache node).The mode of customer end adopted secondary Hash mapping is come the locator data item, is expressed as Fig. 2.At first the key value with data item is mapped on the Hash ring by a hash function; positional information is labeled as token; by secondary Hash mapping function the token that gets access to being mapped as a value then is subregion sign between 1 to Q; after obtaining the subregion sign, client is inquired about the cache node of depositing this subregion from the partitioned server mapping table.Can change the partitioned server mapping relations during caching system dynamic retractility, at this moment, client needs to upgrade synchronously this mapping table (being also referred to as routing table or subregion routing table).

(3) the subregion routing table is the key that whole caching system is able to true(-)running.During the migration of caching system dynamic retractility and data, the subregion routing iinformation can change, and therefore need have a sets of data synchronization mechanism to guarantee that the content of routing iinformation can in time upgrade and obtain effective affirmation.Existing data synchronization mechanism mainly comprises client affirmation, server affirmation, TTL (Time To Live) and piggybacking (Piggyback Validation) mechanism, and subregion routing iinformation of the present invention adopts Piggyback Validation mode synchronously.Because the check information amount is less, the network overhead that this synchronization mechanism is introduced is also less relatively, and the client and server end need not to record too much information simultaneously, can not produce overhead.Particularly, when client sends data access request to caching server, in request, understand incidentally subregion route version information and synchronizing signal; During the service end analysis request, then in the request response, add synchronized result as the existence of finding synchronizing signal; Client is as receiving asynchronous signal, and then the routing iinformation that please look for novelty to server is again asked otherwise finish this.

Based on above-mentioned three kinds of mechanism, technical solution of the present invention can be expressed as Fig. 3, may further comprise the steps:

1) in each sampling window of feature sampling time section (or fixed reference feature sampling time section), monitors the utilance of processor (CPU), internal memory (Memory) and the network resources such as (Network) of each cache node.

When 2) each feature sampling time section finished, each cache node was responsible for calculating the weighting load value in this feature sampling time section, account form such as equation (2):

L_{i} = Σ_{j = 1}^{M} (α * {CPU}_{ij} + β * {Mem}_{ij} + γ * {Net}_{ij}) / M - - - (2)

L wherein _iRepresent the weighting load value of cache node i in this feature sampling time section, α, beta, gamma represent CPU respectively, the weights of internal memory and network, CPU _Ij, Mem _IjAnd Net _IjRepresent the CPU of node i in j sampling window respectively, internal memory and network resource utilization, M are represented the sampling window number.

Each cache node is sent to the cache cluster manager with the weighting load value that calculates, and manager is responsible for calculating the average load value of system in this feature sampling time section and how system to be stretched and makes decisions account form such as equation (3):

\overset{&OverBar;}{L} = Σ_{i = 1}^{N} L_{i} / N - - - (3)

Wherein

Representative system average load value, N represents the cache node number.

When the average load value of system is higher than threshold value thre _MaxThe time, the XM extended operation; When being lower than threshold value thre _MinThe time, the XM shrinkage operation.The cache cluster manager is responsible for coordinating each cache node and is finished the renewal of subregion routing iinformation and the migration of data partition.

3) during the cache node dynamic expansion, the partial buffering data need be finished migration between node.The inventive method is considered the influence that hot spot data causes system availability, on the basis of data with existing partition scheme, by the target of the partition data on the mobile unbalanced service node with the realization load balancing.Existing cache node number is N in the supposing the system, and target is to find the solution N+1 node dynamically to add fashionable data partition migration scheme.The inventive method is used transBytes _iThe averaging network flow of expression cache node i in feature sampling time section uses w _iThe weights of expression cache node i.The variance D (T) of employing weighted network flow T comes the balanced intensity of marked network flow, definition weighted network flow T _iT is as follows with the average weighted network traffics:

T_{i} = \frac{{transBytes}_{i}}{w_{i}}

\overset{&OverBar;}{T} = Σ_{i = 1}^{N + 1} T_{i} / (N + 1) - - - (4)

In a feature sampling time section, if the weighted network flow T of cache node _iSatisfy

(ε is threshold value) thinks that then this node is in the load balancing state; Otherwise, think to be in non-balanced state.Definition migration node set is for satisfying

The set that the node of condition constitutes.According to weighted network flow T _iAnd the relation between the average weighted network traffics will be moved node set and is further divided into the node set MigrationOutSet that moves out (node i belongs to the node set of moving out, and and if only if ) and the node set MigrationInSet that moves into (node i belongs to the node set of moving into, and and if only if

).Simultaneously, new node is added among the MigrationInSet.The basic element of migration is data partition, and each partition network flowmeter of cache node i is shown set { t _I1, t _I2, t _I3T _Ik, k represents the number of partitions of this node.Migratory direction is constrained to unidirectional, and namely data can only the node from MigrationOutSet be moved on the node among the MigrationInSet.Target function is shown in equation (5):

Target function:

Min Σ_{i = 1}^{S} D (T_{i}) - - - (5)

The target of finding the solution data partition migration scheme is the variance sum minimum that makes migration node set weighted network flow T, and wherein S represents to move node number in the node set (S＜N).The remaining space of supposing the node of moving into is space to be filled, and the redundant data of the node of moving out is article to be loaded, and then this problem is converted into the many knapsacks of 0-1 (0-1MKP) problem.The 0-1MKP problem is a np hard problem (S.Hanafi, A.Freville.An efficient tabu search approach for the 0-1 multidimensional knapsack problem.European of Operational Research, 1998,106:659～675), the present invention uses branch and bound method to find the solution the approximate optimal solution of this problem.

4) based on finding the solution the data transference package that obtains in the step 3), each cache node of cache cluster manager coordinates is finished the data migration operation.Data consistency in the transition process and service availability need be protected, the inventive method adopts the Data Access Protocol of three phase requests to realize the consistency visit of migration data, adopt controlled data migration method to avoid migration operation to cause excessive expense to caching server, thereby ensured the lasting availability of buffer memory service.

1. based on the Data Access Protocol of three phase requests

In order to ensure the continuity of user's visit in the transition process, after the data partition migration is finished in employing, again with the strategy of subregion from the knot removal of moving out.When data were moved, same data may be present in move out node and the node of moving into simultaneously like this, and the version of the two may be inconsistent.Consider when migration operation begins that the buffer memory service node is new routing information more, and client and service end synchronous routing table version number and judge whether current routing iinformation is latest edition, if for legacy version then need to safeguard again a up-to-date routing table.Therefore in data migration process, when taking place client to the visit of migration data, can adopt the preferential strategy of the node of moving into, i.e. (V when the data of move into node and the node of moving out produce conflict _i≠ V _o), cache client is thought the node data V that moves into _iEffectively, and with these data return and (namely directly return V _iGive the user), be expressed as Fig. 4.More during new data, owing to be difficult to monitor the transition state of each data item, agreement adopts the equal method for updating in the side of moving out and the side of moving into, and namely client at first can be upgraded the data V of the node of moving out _o(more new data namely directly covers), after getting access to the successful information of renewal, upgrade the node of moving into again, lose to avoid upgrading that (the renewal is here lost and referred to these data successfully migrated to when moving into node when the node of moving out before Data Update request arrival, if only do not upgrade synchronously the node of moving into moving out node updates this moment, then the data got from the node of moving into of client are inconsistent datas of legacy version), be expressed as Fig. 5.

Owing to adopt cache client subregion routing mode initiatively, for realizing the consistency target of migration data visit, client need adopt corresponding data access mode according to server state, thereby hold mode is synchronous.In order to address this problem, the inventive method has realized a kind of Data Access Protocol based on client three phase requests.Cache client can be preserved the routing iinformation of a plurality of versions in this agreement, will ask simultaneously to handle to be divided into three phases.Phase I, it is synchronous that client is carried out routing iinformation based on piggybacking (Piggyback Validation) mechanism and buffer memory service node, and the result of information synchronization guarantees that the routing table that requesting client has up-to-date two versions (is expressed as RT _nAnd RT _N-1); Second stage, client finish routing iinformation synchronously after, use the subregion routing table RT of latest edition in the caching system _nCarry out route, data cached based on the request of piggybacking mechanism; Phase III, the data value that returns according to the second stage caching server and the server state (being divided into transition state and stable state) that returns handle accordingly.If this moment, server was in transition state, client can judge at first whether the data value that server returns is empty.If be not empty, show that request msg successfully migrates to destination node, then client directly returns to this data value the application service of the request of sending; Otherwise if the data value that returns shows that for empty request msg does not migrate to destination node as yet, then client is according to the routing iinformation RT of last revision _N-1Find out the node of moving out, and send request of data to this node.If server was in stable state (expansion was finished this moment, the free of data transition state), show that transition process completes successfully, this moment, data cached memory location in caching system had uniqueness, and client directly returns to this data value the application service of request.

2. controlled data are moved

During the cache node dynamic expansion, resource utilization ratio is positioned at saturation point, and the data migration operation can be introduced extra network and computing cost, node is providing service if move out this moment, this part expense might cause system service abnormal end, and too fast migration velocity may be aggravated system originally with regard to already present resource pressure simultaneously.Finish smoothly in order to ensure migration operation, eliminate simultaneously the influence that the data migration causes system availability as much as possible, the present invention adopts a kind of controlled data migration method, namely rationally control migration operation progress (data cached migrate to low load node from the high capacity node usually, so migration operation is bigger to the node influence of moving out) according to the performance condition of node of moving out.The principal element that influences caching server request treatment effeciency is the I/O network bandwidth, in the inventive method, the node of moving out is monitored its network overhead, and when the network flow velocity reached the bandwidth threshold of a certain setting, the node of moving out can reduce θ % with the data migration velocity.

Meanwhile, if the long or frequent interruption of transition process duration then can cause system to be in transition state for a long time and increases extra data sync expense, reduce the entire system performance.Therefore, can not interrupt the data migration operation fully when slowing down the data migration velocity, the minimum available network flow velocity of the inventive method setting data migration is 10% of this meshed network bandwidth.

When 5) cache node dynamically shrinks, at first still be specified data migration scheme, two the lightest node i of load in the selecting system (using L to represent) ', j ' (L _{J '}＜L _{I '}) carry out subregion and merge node i ' and all subregions all migrate to node j '.In remaining cache node, according to weighted network flow T _iWith the average weighted network traffics Relation, will satisfy

The migration node set of condition is further divided into move out node set and the node set of moving into.Use the subregion bound method to find the solution data partition migration scheme, target is the variance sum minimum that makes migration node set weighted network flow T.Finish the data migration based on above-mentioned migration scheme, the consistency guarantee in the data migration process is finished based on the Data Access Protocol of three phase requests, adopts controlled data migration method effectively to control the migration progress simultaneously.When migration operation finishes, with node i ' from cache cluster, remove, be expressed as Fig. 6.

Compared with prior art, the present invention has following technical advantage:

1) the data balancing mechanism of the present invention's employing has increased the support to heterogeneous nodes, considered the influence that the focus subregion produces system availability in the running simultaneously, to ensure network traffics equiblibrium mass distribution between each cache node, optimized resource utilization ratio by the data partition on the mobile unbalanced service node.

Data consistency security problem during 2) at dynamic retractility, the present invention has realized a kind of Data Access Protocol based on three phase requests, and this agreement adopts piggybacking mechanism subregion routing iinformation synchronously, and the synchronization overhead of introducing is less, and is easy to implement.In three phase requests are handled, the routing iinformation of a plurality of versions of client maintenance, in conjunction with and server between route and state synchronized, solved the inconsistent problem of the data access in the transition process; By adopting controlled data migration method rationally to control the migration progress, the influence of effectively having avoided the data migration that system availability is caused.

3) the present invention's flexible resource of can be the buffer memory platform is supplied with Performance tuning and is provided support.By all kinds of resource utilizations of cache node are detected, the performance bottleneck in the helpdesk administrative staff discovery system in time, and finish the dynamic expansion of cache node automatically, ensured continuity and the consistency of service quality; When resource utilization ratio is low, finish the dynamic contraction of node automatically, realize the supply as required of resource, reduced artificial participation simultaneously.

Description of drawings

Fig. 1 represents the consistency Hash;

Fig. 2 represents the secondary Hash mapping;

Fig. 3 represents distributed caching dynamic expansion method;

Get operation when Fig. 4 represents the data migration;

Update operation when Fig. 5 represents the data migration;

Fig. 6 represents the dynamic contraction method of distributed caching;

Fig. 7 represents deployment topologies figure;

Fig. 8 represents the processing of three phase requests;

Fig. 9 represents to move control flow;

Figure 10 represents dynamic expansion performance test result.

Embodiment

The invention will be further described below in conjunction with specific embodiments and the drawings.

Whole distributed caching system is made up of caching server (Cache server), cache client (Cache Client) and buffer memory cluster manager dual system (Cache Admin) three parts.Pass through network connection between the three.Each caching server independent operating passes through Management Agent unified monitoring and management by the cache cluster manager.Management Agent and caching server are positioned at same physical node, are responsible for generating JMX management MBean, and after receiving the control command that the cache cluster manager sends, the Management Agent meeting is adaptive automatically should order and the corresponding operation of control buffer memory service processes execution.

In the cache cluster manager, the topology monitor adopts the multicasting technology monitor server node topology based on Jgroups to change, and obtain the performance information of each service node by JMX remote access Management Agent, finally provide unified monitoring view to control whole cache cluster.The cache cluster manager is mainly by multicast communication assembly, topological monitor, controller, and the cluster management module, Web end control desk is formed.Wherein the cluster management module mainly provides data partition heavily to distribute and two major functions of cluster scale adjustment (comprising dynamic expansion and contraction).

Caching server mainly is divided into buffer memory service and Management Agent two parts.Wherein the buffer memory service is made up of data management module, order control engine, state management module and migration management module.Data management module mainly is in charge of the memory headroom that has distributed, also is responsible for data cached tissue, storage and inquiry simultaneously.Order control engine mainly is responsible for according to buffer memory process of commands node state scheduling cache request, communicates the agreement relevant treatment, and finishes the cache request processing procedure.State management module mainly is responsible for the condition managing of caching server, comprises that the piggybacking to route information is handled in server migration state, cache cluster scale variable condition and the client-requested.The migration management module is responsible in internodal migration work.The major responsibility of Management Agent is to order cycle management for the caching server cluster provides Topology Management and buffer memory waiter.

Cache client mainly is made up of main modular such as cache client API, client kernel, subregion selector, request transponder, many versions configuration manager, adaptive integration modules, is responsible for communicating by letter and state synchronized etc. between application and caching server.Wherein, cache client API, client kernel and subregion selector have been realized the basic function of client, request transponder and many versions configuration manager are in order to support the Data Access Protocol of three phase requests that the inventive method proposes, and adaptive integration module is then in order to strengthen maintainability and the ease for use of existing caching system.Cache client API provides and has comprised reading and writing, a series of service interface of deletion data.The client kernel mainly comprises node administration module, connection management module and communication module, and the Core Feature of cache client is provided.Wherein the connection management module be responsible for the maintain customer end to all connections of caching server with use these to be connected necessary communication.Cache client realizes that based on JAVA NIO than conventional congestion I/O model, JAVA NIO has efficiently, the less advantage of resource overhead.Conventional congestion I/O model often adopts the mode that is connected to form connection pool that is pre-created some to improve treatment effeciency, and only need creating a connection, NIO gets final product, so just significantly reduce the expense of thread creation and switching, can better support high concurrent processing of request.After the node administration module is obtained connection, realize the network data exchange of NIO mechanism by the calling communication module.The subregion selector has been realized a kind of route selection algorithm based on the consistency Hash.When data access, at first corresponding subregion is found in mapping based on key assignments, finds this section post corresponding cache server according to the subregion routing table then.The request transponder mainly is responsible for handling the synchronizing information of returning from server end, and carries out subsequent request according to this information and transmit.Many versions configuration manager is responsible for safeguarding the routing table information of a plurality of versions.During client terminal start-up, configuration manager can connect any one caching server node, obtains and resolve routing iinformation, realizes the self-configuring of connection attribute; When the caching server end was in transition state, configuration manager can provide the support of historical routing iinformation.Simultaneously, for ease of caching system and existing application server (Tomcat for example, Jboss etc.) slitless connection, the state backup and high available support of loose coupling are provided, adaptive integration module is responsible for the status object module (Session) of each application server is carried out adaptive, and so only configuration file need simply be set just can use the buffer memory service.Adaptive integration module provides general Java object sequence method simultaneously, by the Java object is converted into the XML form, realizes serializing and unserializing flexibly.

As the experimental situation of present embodiment, front end uses LoadRunner to generate load, and middleware Web container adopts the request of Tomcat process user, and the rear end is DB2 database and distributed caching system.Wherein, the distributed caching system is mainly used to the buffer memory business datum, accelerates application access, eliminates the database access bottleneck simultaneously.Deployment topologies can be expressed as Fig. 7, and the environment for use configuration is as shown in table 1.

The Web that present embodiment adopts is applied as a simple on-line shopping system, comprises functions such as goods browse, commercial articles ordering, acknowledgement of orders.The business datum of this application is kept in the DB2 database, and initial book data amount is 3000000 books records, and every books record comprises bibliography information and book contents information.This is used by using the data in the transparent access cache system of Hibernate framework.Test script adopts LoadRunner to record, and comprises internet book store's page browsing, submits operations such as shopping list to, adds 0.3 second think time (Think time) between each request.

The configuration of table 1 environment for use

Present embodiment method idiographic flow is as follows:

1) in the present embodiment method, Management Agent is responsible for providing all kinds of statistical informations of buffer memory service end, as: buffer memory reading times, update times, hit-count, each partition network flow etc.; Buffer memory service node status monitoring function is provided simultaneously, as: CPU, internal memory and network utilization etc.In each sampling period of feature sampling time section, Management Agent is responsible for monitoring the utilance (for the ease of calculating, all representing with the percentage form) of the resources such as CPU, internal memory and network of place node.It is 300 seconds that present embodiment defines each feature sampling time section, is made up of 10 sampling windows, and each sampling window width is 30 seconds.

2) when each feature sampling time section finished, Management Agent was calculated the weighting load value in this feature sampling time section.α, beta, gamma represent CPU respectively, the weights of internal memory and network, and the critical system resources of considering the distributed caching system is internal memory and network I/O, and cpu resource priority is low slightly, and present embodiment is with α, and beta, gamma is made as 0.1,0.4 and 0.5 respectively.Whether the cache cluster manager calculates the average load value of system in this feature sampling time section based on long-range each the node monitoring information that obtains of JMX, need to stretch and how to stretch based on this load value decision system.Present embodiment is with the threshold value thre of node extended operation _MaxBe defined as 70%, the threshold value thre that node shrinks _MinBe defined as 30%, namely when system's average load value is higher than 70%, the XM extended operation, when being lower than threshold value 30%, the XM shrinkage operation.

3) during the cache node dynamic expansion, at first need to solve new node and add fashionable data partition migration scheme.In the present embodiment, Server1, Server2 and Server3 are existing server node in the caching system, and the 4th node Server4 is node to be added.The cache cluster manager is based on the long-range network traffic information (transBytes) that obtains 3 service nodes of JMX.Because each cache node configuration is identical, so weight w _iAll be made as 1.Based on above-mentioned information, the cache cluster manager calculates the weighted network flow T of each node _iAverage weighted network flow value with system

In a feature sampling time section, according to

And the relation between the threshold epsilon is divided into equilibrium state and non-balanced state with the cache node state, and present embodiment is made as 15% with threshold epsilon.Be in the node of non-balanced state, the weighted network flow surpasses the node of mean value and puts into the node set of moving out (MigrationOutSet), puts into the node set of moving into (MigrationInSet) with new node Server4 and less than the node of mean value.The basic element of migration is data partition, and migratory direction is constrained to the partition data on the cache node among the MigrationOutSet and moves on the node among the MigrationInSet.Based on each partition network flow information of cache node and migration node set information, the approximate optimal solution that present embodiment adopts branch and bound method to find the solution the migration scheme, target is the variance sum minimum that makes migration node set weighted network flow T.

4) based on finding the solution the data transference package that obtains in the step 3), the cache cluster manager at first uses controller to send new subregion routing iinformation to upgrade routing configuration to all service nodes, uses controller to control each node then and finishes the data migration by migration plan.In order to ensure the data consistency in the transition process, the inventive method adopts the Data Access Protocol based on three phase requests, the state management module of service node is responsible in service end transition state and the client-requested piggybacking of route information is handled, the client-requested transponder is responsible for handling the synchronizing information from service end, carries out subsequent request and transmits processing.

1. based on the Data Access Protocol of three phase requests

Be the explanation handling process, suppose when initial to have three cache node Server1, Server2 and Server3 in the system that whole Hash ring is divided into the data partition of sizes such as 12 parts, each self-contained 4 parts of three cache nodes, the subregion routing table is expressed as RT _N-1(Server1, Server2 and Server3 are abbreviated as S1, S2 and S3 respectively in the routing table).New node Server4 adds during the system dynamic expansion, finds the solution the data transference package that obtains according to step 3), and

subregion

4,8 and 12 need be moved among the Server4 by former service node, and up-to-date subregion routing table is expressed as RT _n, as shown in Figure 8.Suppose that the migration of present data do not finish as yet, client receives that a key assignments is the data access request of Key1.

(1) client is carried out Hash calculation (adopting Distributed Consistent Hashing algorithm) to this key assignments, and result of calculation is mapped to Hash ring a position, through the secondary Hash mapping position a is mapped in the data partition 4 then.Suppose that the current routing table version of client is t (t≤n-1, consider if client request msg not for a long time, the routing table version may be lower), according to this routing iinformation, locator data subregion 4 is stored in (value of D may be 1,2 or 3) on the server node Server D, user end to server node Server D sends data access request, version synchronizing information incidentally in the request;

(2) caching server node Server D receives incidentally and carries out the version verification after the information, if t＜n-1 then can return the routing table RT of up-to-date two versions _nWith RT _N-1If t=n-1 then can return up-to-date routing table RT _nThe request of phase I this moment is finished;

(3) client reads the subregion routing table RT of latest edition _n, according to routing iinformation, Key1 is stored on the server node Server4, and client sends data access request to Server4.

(4) since this moment the data migration operation do not finish as yet, Server4 in the request response incidentally Returning mark bit table prescribed server be in transition state.If these data successfully migrate to Server4, then directly return this data to the application service of the request of sending, if do not migrate to Server4 as yet, then return null value.This moment, the second stage request was finished;

(5) client is received when server is in the message of transition state, can judge whether the data value that returns from server is empty.If be not empty, then directly this data value is returned to the application service of the request of sending; Otherwise, handle if the data value that returns for empty, then needs to carry out extra request.Client terminal local reads the routing iinformation RT of last revision _N-1, according to this version routing iinformation, Key1 is positioned on the server node Server1, and client sends data access request to Server1 again;

(6) Server1 will ask result to be returned.The request of phase III this moment is finished;

(7) cache client is returned result to the application service of the request of sending.

2. controlled data are moved

Migration management module in the buffer memory service node is responsible for data partition in internodal migration work, when migration orders arrival and parsing to be finished, creates the background migrate thread and carries out data migration work.The migration thread obtains the block chained list of corresponding subregion by migration subregion view, begins to carry out the data migration operation from the afterbody of chained list, namely adds the high priority data migration of caching system at first, and the back adds the data of caching system and moves at last.Watch-dog monitoring system network traffics expense after the migration thread is finished the volume of transmitted data of fixed size, is obtained system mode from watch-dog simultaneously, and adjusts the migration velocity in next data migration cycle.In the methods of the invention, when the network flow velocity reaches the network bandwidth threshold value of setting, migration velocity is reduced by 20%.The minimal network flow velocity of reserving is 10% of this meshed network bandwidth.After all data partition migrations finished, the replacement server state was stable state.Be expressed as Fig. 9.

Present embodiment is tested at this dynamic expansion method.Whole experiment is used the concurrent visit of LoadRunner analog subscriber, and each affairs comprises 10 servlet requests, and average transaction response time (Average Transaction Response Time) data are as evaluation criterion in the collection experimentation.Be expressed as Figure 10.Have Server1, Server2 and three service nodes of Server3 in the caching system when initial, the user concurrent amount is made as 600.At warm-up phase, because using, web constantly business datum is put into caching system, and the average transaction response time of system descends gradually, and after the caching system space reached certain load, application performance tended towards stability, and the average transaction response time is 12.9 seconds.Further increase the visit load this moment, and the user concurrent amount is transferred to 1000, and the caching server cluster arrives the resource saturation point, and dynamic expansion adds 1 the new continuity of cache node server4 to ensure service quality.Consider that data migration process can introduce overhead to system, average transaction response time this moment slightly rise (amplitude is 0.8 second).After expansion process was finished, system processing power increased, and the response time descends again, finally reaches stable again, and the average transaction response time is 8.1 seconds, compared when stablizing before, had reduced 4.8 seconds.

For the further validity of proof load equalization algorithm, the variance D (T) of present embodiment definition weighted network flow T and average weighted network flow value T square ratio be the unbalanced degree of load of cache cluster.Collect the unbalanced degrees of data of load after new node adds, be expressed as table 2.Dynamic expansion method with not working load equilibrium is compared, and the inventive method has better load balancing effect, and the unbalanced degree of system load has reduced by 82.4%.

The unbalanced degree of load of two kinds of methods of table 2

The dynamic expansion of holding load equilibrium	The dynamic expansion of holding load equilibrium not	The two ratio
			6.28*10 ^-3	3.56*10 ^-2	0.176

5) when system's average load value is lower than threshold value 30%, the XM shrinkage operation.In the present embodiment, supposing that Server1, Server2, Server3 and Server4 are existing server node in the caching system, at first is the establishment of migration scheme.Two cache nodes of weighting load value L minimum (are assumed to Server2 and Server3 and L in the selecting system _Server3＜L _Server2), all subregions of Server2 are all migrated to Server3.To consider in addition cache node is carried out load balancing, in remaining cache node Server1 and Server4, will satisfy

Node add the node set of moving out, will satisfy

Node add the node set of moving into, threshold epsilon ' be made as 10%.The approximate optimal solution that present embodiment adopts branch and bound method to find the solution the migration scheme.Finish the data migration based on above-mentioned migration scheme, use the Data Access Protocol of three phase requests to ensure data consistency in the data migration process, adopt controlled data migration method effectively to control the migration progress.When migration operation finished, Server2 removed from cache cluster with node.

Claims

1. A distributed cache dynamic scaling method that supports load balancing, the steps are:

1) Each cache server regularly monitors its resource utilization;

2) Each cache server calculates its own weighted load value L _i according to the currently monitored resource utilization, and sends it to the cache cluster manager;

3) The cache cluster manager calculates the current average load value of the distributed cache system according to the weighted load value L _i of each cache server

when

When it is higher than the set threshold thre _max , the expansion operation is performed; when

When it is lower than the set threshold thre _min , the contraction operation is performed; among them,

a) The expansion operation is: first calculate the weighted network traffic T _i of each cache server and the average network traffic of the system

then will satisfy

The caching servers form a migration node set, which satisfies

The cache servers of the group form a set of migrating nodes, and at the same time, add new nodes to the set of migrating nodes; finally, according to the objective function

Determine the migration plan for the data of the cache server in the transfer-out node set to the cache server in the transfer-in node set, and perform data migration;

b) The shrinking operation is: first select two cache servers with the smallest weighted load value L in the system to perform the merge operation; then calculate the weighted network traffic T _i of each cache server and the average network traffic of the system

then will satisfy

The caching servers form a migration node set, which satisfies

The cache servers form a set of migrating nodes; finally according to the objective function

Determine the migration plan for the data of the cache server in the transfer-out node set to the cache server in the transfer-in node set, and perform data migration; where ε, ε' are respectively set thresholds, and D(T) is a weighted network The variance of traffic T, and S is the total number of cache servers where data migration occurs.

2. The method according to claim 1, wherein the resources include: a processor, a memory, and a network. the

3. the method for claim 2 is characterized in that the computing formula of described weighted load value _Li is:

Among them, α represents the weight of the CPU, β is the weight of the memory, γ is the weight of the network, CPU _ij represents the CPU utilization of the node i in the jth sampling window, and Mem _ij represents the node i in the jth sampling window Memory utilization in the window, Net _ij represents the network utilization of node i in the jth sampling window, and M represents the number of sampling windows.

4. The method according to claim 1, 2 or 3, characterized in that the objective function is obtained by using a branch and bound method to obtain the migration scheme. the

5. The method according to claim 1, 2 or 3, wherein if the client accesses the migration data during the data migration process, the client first updates the migration data of the outgoing node, and when the update operation obtains the update success After the information is updated, the migrating node is updated. the

6. The method according to claim 1, 2 or 3, wherein if during the data migration process, the client accesses the migrated data, and the data returned by the migrating node is inconsistent with the data returned by the migrating node, the client The end reads the returned data from the migrating node. the

7. The method according to claim 1, 2 or 3, wherein if during the data migration process, the client accesses the migration data, then:

1) The client uses the piggyback confirmation mechanism to synchronize routing information with the cache server where the access data is located, ensuring that the client has the latest two versions of the routing table: RT _n and RT _n-1 ;

2) The client uses the partitioned routing table RT _n for routing, and uses the piggyback confirmation mechanism to request cached data from the cache server;

3) The client performs corresponding processing according to the return value of the cache server and the returned server state, that is, when the server state is in the migration state, if the return value is not empty, the client directly returns the processing result to the requesting application service; If the return value is empty, the client finds the outgoing node according to the routing information RT _n-1 of the previous version, and sends a data request to the node; when the cache system is in a stable state, the client directly sends the cache server's return The value is sent to the requesting application service.

8. The method according to claim 1, 2 or 3, wherein the outgoing node monitors its network overhead, and when the network flow rate reaches a certain set bandwidth threshold, the outgoing node migrates the data The speed is reduced by θ%, but the reduced data migration speed is not less than the set minimum data migration speed. the

9. A distributed cache dynamic scaling system that supports load balancing, characterized in that it includes a cache server, a cache client, and a cache cluster manager; the cache server is connected to the cache client and the cache cluster manager through a network respectively ;in:

The cache server is used to control the cache service process to perform corresponding operations according to the received control command of the cache cluster manager;

The cache cluster manager is used to monitor the topology changes of the cache server nodes, obtain the performance information of each cache server, and finally provide a unified monitoring view to control the entire cache cluster; wherein, the cache cluster manager is based on the weighted Load value L _i , calculate the current average load value of the distributed cache system when

When it is lower than the set threshold thre _min , the contraction operation is performed;

The extended operation is: first calculate the weighted network traffic T _i of each cache server and the average network traffic of the system

then will satisfy

The caching servers form a migration node set, which satisfies

The shrinking operation is: first select two cache servers with the smallest weighted load value L in the system to perform the merge operation; then calculate the weighted network traffic T _i of each cache server and the average network traffic of the system

then will satisfy

The caching servers form a migration node set, which satisfies

Determine the migration plan for the data of the cache server in the transfer-out node set to the cache server in the transfer-in node set, and perform data migration; where ε, ε' are respectively set thresholds, and D(T) is a weighted network Variance of traffic T, S is the total number of cache servers where data migration occurs;

The cache client is responsible for communication and state synchronization between the application and the cache server. the

10. The system according to claim 9, wherein the cache server includes a cache service module and a management agent; the cache service module includes a data management module, a command control engine, a state management module and a migration management module; The client includes a request forwarder and a multi-version configuration manager; among them:

The data management module is used to manage the allocated memory space, and is also responsible for the organization, storage and query of cached data items;

The command control engine is used to schedule the cache request according to the processing state of the cache command, perform communication protocol processing, and complete the cache request processing process;

The state management module is used for state management of the cache server, including server migration state, cache cluster scale change state, and piggyback confirmation processing of routing information in client requests;

The migration management module is responsible for migration between cache server nodes;

The management Agent is used to provide topology management and cache service lifecycle management for cache server clusters;

The request forwarder is used to process the synchronization information returned from the cache server, and perform subsequent request forwarding according to the information;

The multi-version configuration manager is used to maintain multiple versions of routing table information. the