CN103905250B

CN103905250B - A kind of method of optimum management cluster state

Info

Publication number: CN103905250B
Application number: CN201410106962.5A
Authority: CN
Inventors: 孟宪伟; 周博; 王倩
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2014-03-21
Filing date: 2014-03-21
Publication date: 2018-02-23
Anticipated expiration: 2034-03-21
Also published as: CN103905250A

Abstract

The invention provides a method for optimizing and managing the cluster state, which is applicable to the state management of large-scale high-availability clusters, involves cluster state, group state and resource state, and is especially aimed at environments with high bandwidth and response time requirements. Only the resource status update logic is reserved, and the group status update and cluster status update functions are cancelled. The group status and cluster status settings are completely included in the resource status update logic. At the same time, the cluster IP restriction is canceled, and the cluster processing logic is optimized.

Description

A Method for Optimally Managing Cluster State

技术领域technical field

本发明涉及计算机领域，尤其涉及高可用集群管理，具体地说是一种优化管理集群状态的方法。The invention relates to the computer field, in particular to high-availability cluster management, in particular to a method for optimizing and managing the cluster state.

背景技术Background technique

在高可用集群管理中，状态管理是很重要的，因为它是一切活动的触发条件和最终处理，集群是否能够保持高可用性，很大程度上取决于状态管理的正确性和及时性。而在正常的集群活动中，无论是启停集群还是启停组或资源，都会触发很多的资源，组和集群的状态更新，经常接触高可用集群的人会发现，此时状态更新占去了大部分带宽，甚至会拖延到正常的集群活动。在集群有异常发生的时候，集群的状态更新同样会影响到集群对于异常的处理速度。In high-availability cluster management, state management is very important, because it is the trigger condition and final processing of all activities. Whether the cluster can maintain high availability largely depends on the correctness and timeliness of state management. In normal cluster activities, whether it is starting and stopping the cluster or starting and stopping the group or resources, it will trigger a lot of resource, group and cluster status updates. People who often contact high-availability clusters will find that status updates take up Most of the bandwidth will be stalled even to normal cluster activity. When an exception occurs in the cluster, the status update of the cluster will also affect the processing speed of the cluster for the exception.

因此，如何能够有效地减少集群状态更新对于高可用集群管理就显得格外重要。另外原有的集群管理中一般都会为了标记主节点，而单独设立一个集群IP，这即对于一些状态管理有所障碍，对于用户也会造成一定的困扰，因为在大部分应用场景中，由于处于内网环境中，因此IP都为非常宝贵的资源，如果能去掉集群IP的要求，便节省了IP资源。Therefore, how to effectively reduce cluster status updates is particularly important for high-availability cluster management. In addition, in the original cluster management, a separate cluster IP is generally set up in order to mark the master node, which hinders some status management and causes certain troubles for users, because in most application scenarios, due to In the intranet environment, IP is a very precious resource. If the requirement of cluster IP can be removed, IP resources will be saved.

发明内容Contents of the invention

本发明使用一种优化的集群资源管理方法，提高集群管理效率，减少了带宽消耗，并且清晰了状态管理逻辑。该方法主要包括以下几个方面：The invention uses an optimized cluster resource management method, improves cluster management efficiency, reduces bandwidth consumption, and clarifies state management logic. This method mainly includes the following aspects:

(1) 集群状态结构(1) Cluster state structure

集群状态结构和现有高可用集群状态管理逻辑相同，存在一个集群状态值和两个状态列表，分别为组状态列表和资源状态列表。The cluster state structure is the same as the existing high-availability cluster state management logic. There is a cluster state value and two state lists, namely the group state list and the resource state list.

(2) 状态更新逻辑(2) Status update logic

为了简化状态更新，只保留资源状态更新逻辑，取消组状态更新及集群状态更新函数，组状态和集群状态设置完全包含在资源状态更新逻辑中。此处虽然增加了单条资源状态处理的逻辑，但是由于减少了状态更新命令的总数，所以总体上状态更新节约了不少资源。In order to simplify the state update, only the resource state update logic is retained, the group state update and cluster state update functions are canceled, and the group state and cluster state settings are completely included in the resource state update logic. Although the logic of single resource state processing is added here, since the total number of state update commands is reduced, the overall state update saves a lot of resources.

a) 单个资源状态更新a) Single resource status update

启停资源或者单个资源报异常时，直接触发节点发送此资源状态更新命令，主节点收到后，更新本地资源状态列表，并且更新该资源所在组状态列表及集群状态。When a resource is started or stopped or a single resource reports an exception, the node is directly triggered to send the resource status update command. After the master node receives it, it updates the local resource status list, and updates the resource group status list and cluster status.

b)组状态更新b) Group status update

启停组操作后，执行节点把资源启动情况返回给主节点，主节点按照返回依次更新资源状态列表及集群状态。After the start-stop group operation, the execution node returns the resource startup status to the master node, and the master node updates the resource status list and cluster status in sequence according to the return.

(3) 同步状态(3) Synchronization status

同样，为了保证集群的高可用性，集群内所有节点必须共享集群的各资源、组状态。因此主节点在处理完资源状态更新后，需要同步给集群内所有其他节点，此时同步的状态也只是有资源状态，集群内各节点收取资源状态更新状态，同理更新本地资源状态，并在内部逻辑中更新组状态及集群状态。Similarly, in order to ensure the high availability of the cluster, all nodes in the cluster must share the resources and group status of the cluster. Therefore, after the master node finishes processing the resource status update, it needs to synchronize to all other nodes in the cluster. At this time, the status of the synchronization is only the resource status. Each node in the cluster receives the resource status update status, similarly updates the local resource status, and Update group status and cluster status in internal logic.

(4) 状态获取(4) Status acquisition

外部获取状态通过控制台连接主节点访问集群状态，根据具体访问要求，直接根据本地集群状态列表返回。The external acquisition status is connected to the master node through the console to access the cluster status, and according to the specific access requirements, it is directly returned according to the local cluster status list.

(5) 取消集群IP(5) Cancel the cluster IP

集群IP是为了标记主节点，但是由于当所有集群资源处于停止状态时，集群IP依然存在，这和本方法根据资源状态标记集群状态有冲突，而且往往还多占用了一个宝贵的内网IP，The cluster IP is used to mark the master node, but since the cluster IP still exists when all cluster resources are stopped, this conflicts with this method of marking the cluster status according to the resource status, and often takes up a valuable intranet IP.

因此如果想利用本文提出的优化集群状态管理的方法，就需要取消集群IP的设置。这里仅仅是需要取消集群IP的设置，主节点的设置依然存在，节点之间知晓主节点存在，并且决策出节点后，要通知控制台知晓，以便外部连接主节点获取集群信息。Therefore, if you want to use the method proposed in this article to optimize cluster state management, you need to cancel the cluster IP setting. Here it is only necessary to cancel the setting of the cluster IP, the setting of the master node still exists, the nodes know the existence of the master node, and after the node is decided, the console should be notified so that the external connection to the master node can obtain the cluster information.

本发明与现有技术相比，所产生的有益效果是：Compared with the prior art, the present invention has the beneficial effects of:

提供了一个优化管理集群状态的方法，这样既节省了带宽，保证了集群的通讯效率，又简化了处理逻辑，降低了程序出错的概率，而且还取消了集群IP的设置，节省了IP资源。提高集群管理效率，减少了带宽消耗，并且清晰了状态管理逻辑，优化的状态管理逻辑和无集群IP的管理方法为集群管理提供了便捷的路径。优化高可用集群管理软件的状态管理，提高管理效率并减少带宽占用。Provides a method to optimize the management of the cluster state, which not only saves bandwidth, ensures the communication efficiency of the cluster, but also simplifies the processing logic, reduces the probability of program errors, and cancels the cluster IP setting, saving IP resources. Improve cluster management efficiency, reduce bandwidth consumption, and clarify the state management logic. The optimized state management logic and the management method without cluster IP provide a convenient path for cluster management. Optimize the state management of high-availability cluster management software, improve management efficiency and reduce bandwidth occupation.

附图说明Description of drawings

附图1是本发明的状态更新/获取流程图。Accompanying drawing 1 is the flow chart of status updating/obtaining of the present invention.

具体实施方式detailed description

(1) 集群状态结构(1) Cluster state structure

(2) 状态更新逻辑(2) Status update logic

a)单个资源状态更新a) Single resource status update

b)组状态更新b) Group status update

(3) 同步状态(3) Synchronization status

(4) 状态获取(4) Status acquisition

(5) 取消集群IP(5) Cancel the cluster IP

Claims

1. A method for optimizing and managing the cluster state, characterized in that the method mainly involves two parts, one is that all cluster actions only trigger resource state changes, and the other is to cancel the cluster IP setting on the master node;

The method mainly consists of the following:

1) There are still three states, but only resource state triggers are retained, and group state and cluster state processing exist within the resource state processing logic;

2) All nodes in the cluster must share the resources and group status of the cluster; therefore, after the master node finishes updating the resource status, it needs to synchronize to all other nodes in the cluster. Each node receives the resource status update status, similarly updates the local resource status, and updates the group status and cluster status in the internal logic;

3) The external acquisition status is accessed through the console to connect to the master node to access the cluster status, and according to the specific access requirements, directly use the local cluster status list to return;

4) Cancel the cluster IP and only keep the master node mark, which needs to be notified to the console additionally.