[go: up one dir, main page]

CN111930502A - Server management method, device, equipment and storage medium - Google Patents

Server management method, device, equipment and storage medium Download PDF

Info

Publication number
CN111930502A
CN111930502A CN202010760328.9A CN202010760328A CN111930502A CN 111930502 A CN111930502 A CN 111930502A CN 202010760328 A CN202010760328 A CN 202010760328A CN 111930502 A CN111930502 A CN 111930502A
Authority
CN
China
Prior art keywords
server
servers
idle
computer cluster
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010760328.9A
Other languages
Chinese (zh)
Inventor
戴超群
周佳佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Jiaochi Artificial Intelligence Research Institute Co ltd
Original Assignee
Suzhou Jiaochi Artificial Intelligence Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Jiaochi Artificial Intelligence Research Institute Co ltd filed Critical Suzhou Jiaochi Artificial Intelligence Research Institute Co ltd
Priority to CN202010760328.9A priority Critical patent/CN111930502A/en
Publication of CN111930502A publication Critical patent/CN111930502A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)

Abstract

本发明实施例公开了一种服务器管理方法、装置、设备及存储介质。其中,方法包括:获取计算机集群的排队任务数量和空闲服务器数量;判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;若是,则从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作;可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。本发明实施例可以实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。

Figure 202010760328

Embodiments of the present invention disclose a server management method, apparatus, device and storage medium. The method includes: acquiring the number of queued tasks and the number of idle servers in the computer cluster; judging whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers; The target available server is obtained from the list, and the target idle server is controlled to perform the boot operation; the available servers in the bootable server list are the servers that were successfully shut down in the automatic shutdown process. The embodiment of the present invention can dynamically apply for resources according to the current computing task of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for computing tasks, and avoid resource waste.

Figure 202010760328

Description

一种服务器管理方法、装置、设备及存储介质A server management method, device, device and storage medium

技术领域technical field

本发明实施例涉及计算机技术领域,尤其涉及一种服务器管理方法、装置、设备及存储介质。Embodiments of the present invention relate to the field of computer technologies, and in particular, to a server management method, apparatus, device, and storage medium.

背景技术Background technique

在计算机集群中,通常有多个服务器组成计算资源。计算机集群采用将计算任务分配到集群的不同服务器的方式提高计算能力。In a computer cluster, there are usually multiple servers that make up the computing resources. Computer clusters increase computing power by distributing computing tasks to different servers in the cluster.

相关技术中,通常对计算机集群中的服务器统一地进行开启和关闭。所有服务器在开启后,维持在开启状态。如果服务器获取到分配的计算任务,则执行相应的计算操作。如果服务器没有获取计算任务,则维持开启状态,等待分配任务。In the related art, servers in a computer cluster are generally turned on and off uniformly. After all servers are turned on, they remain on. If the server obtains the assigned computing task, it executes the corresponding computing operation. If the server does not obtain a computing task, it remains on and waits for a task to be assigned.

在计算机集群的实际运行过程中,计算机集群的使用率是动态变化的。可能在某段时间计算机集群的利用率不高,而在其他时间因任务激增会出现资源紧张状态。在计算机集群利用不高时,相关技术中所有服务器均维持开启状态会造成一定的资源浪费。During the actual operation of the computer cluster, the utilization rate of the computer cluster changes dynamically. There may be times when the computer cluster is underutilized, and at other times, resources are under strain due to the surge in tasks. When the utilization of computer clusters is not high, all servers in the related art are kept in an open state, which will cause a certain waste of resources.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种服务器管理方法、装置、设备及存储介质,可以根据计算机集群的实际运行情况,动态地开启合适数量的服务器,实现整个计算机集群的功耗维持在与计算任务相适应的程度,避免资源浪费。Embodiments of the present invention provide a server management method, device, device, and storage medium, which can dynamically open an appropriate number of servers according to the actual operation of a computer cluster, so that the power consumption of the entire computer cluster can be maintained at a level suitable for computing tasks. to avoid wasting resources.

第一方面,本发明实施例提供了一种服务器管理方法,包括:In a first aspect, an embodiment of the present invention provides a server management method, including:

获取计算机集群的排队任务数量和空闲服务器数量;Get the number of queued tasks and the number of idle servers in a computer cluster;

判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;Determine whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers;

如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, obtain a target available server from the list of bootable servers of the computer cluster, and control the target idle server to perform booting operate;

其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.

第二方面,本发明实施例还提供了一种服务器管理装置,包括:In a second aspect, an embodiment of the present invention further provides a server management apparatus, including:

数量获取模块,用于获取计算机集群的排队任务数量和空闲服务器数量;The quantity acquisition module is used to acquire the number of queued tasks and the number of idle servers in the computer cluster;

数量判断模块,用于判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;A quantity judgment module, configured to judge whether the number of queued tasks is greater than a preset task quantity threshold, and whether the number of idle servers is less than the preset number of servers;

服务器开机模块,用于如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;A server power-on module, configured to obtain a target available server from the list of bootable servers of the computer cluster if the number of queued tasks is greater than the preset task number threshold and the number of idle servers is less than the preset number of servers, and control all the available servers. The target idle server performs the power-on operation;

其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.

第三方面,本发明实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本发明实施例所述的服务器管理方法。In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the computer program as described herein when the processor executes the computer program. The server management method described in the embodiment of the invention is provided.

第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如本发明实施例所述的服务器管理方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the server management method according to the embodiment of the present invention is implemented .

本发明实施例的技术方案,通过获取计算机集群的排队任务数量和空闲服务器数量,然后判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。The technical solution of the embodiment of the present invention is to obtain the number of queued tasks and the number of idle servers in the computer cluster, and then determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; When the number of idle servers is greater than the preset number of tasks and the number of idle servers is less than the number of preset servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform the power-on operation. According to the number of queued tasks and the number of idle servers, Determine the tension of the current computing task application resources of the computer cluster. According to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and when the server needs to be automatically started, the appropriate number of servers can be dynamically started. In this way, it is possible to dynamically apply for resources according to the current computing task of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for computing tasks, and avoid resource waste.

附图说明Description of drawings

图1为本发明实施例一提供的一种服务器管理方法的流程图。FIG. 1 is a flowchart of a server management method according to Embodiment 1 of the present invention.

图2为本发明实施例二提供的一种服务器管理方法的流程图。FIG. 2 is a flowchart of a server management method according to Embodiment 2 of the present invention.

图3为本发明实施例三提供的一种服务器管理方法的流程图。FIG. 3 is a flowchart of a server management method according to Embodiment 3 of the present invention.

图4为本发明实施例四提供的一种服务器管理方法的流程图。FIG. 4 is a flowchart of a server management method according to Embodiment 4 of the present invention.

图5为本发明实施例五提供的一种服务器管理装置的结构示意图。FIG. 5 is a schematic structural diagram of a server management apparatus according to Embodiment 5 of the present invention.

图6为本发明实施例六提供的一种计算机设备的结构示意图。FIG. 6 is a schematic structural diagram of a computer device according to Embodiment 6 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。In addition, it should be noted that, for the convenience of description, the drawings only show some but not all of the contents related to the present invention. Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations may be performed in parallel, concurrently, or concurrently. Additionally, the order of operations can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.

实施例一Example 1

图1为本发明实施例一提供的一种服务器管理方法的流程图。本发明实施例可适用于对计算机集群中的服务器进行管理的情况,该方法可以由本发明实施例提供的服务器管理装置来执行,该装置可采用软件和/或硬件的方式实现,并一般可集成在计算机设备中。例如,计算机集群中的管理服务器。管理服务器是用于对计算机集群中的全部服务器进行管理的服务器。如图1所示,本发明实施例的方法具体包括:FIG. 1 is a flowchart of a server management method according to Embodiment 1 of the present invention. The embodiments of the present invention may be applicable to the situation of managing servers in a computer cluster, and the method may be executed by the server management apparatus provided by the embodiments of the present invention, which may be implemented in software and/or hardware, and generally integrated in computer equipment. For example, a management server in a computer cluster. The management server is a server for managing all the servers in the computer cluster. As shown in FIG. 1, the method of the embodiment of the present invention specifically includes:

步骤101、获取计算机集群的排队任务数量和空闲服务器数量。Step 101: Obtain the number of queued tasks and the number of idle servers of the computer cluster.

本实施例中,计算机集群的排队任务数量是计算机集群内所有用户的排队任务数量。空闲服务器数量是计算机集群内所有空闲服务器的数量。空闲服务器是服务器内的全部计算资源处于空闲状态的服务器。示例性的,计算资源可以为图形处理器(GraphicsProcessing Unit,GPU)卡。In this embodiment, the number of queued tasks of the computer cluster is the number of queued tasks of all users in the computer cluster. The number of idle servers is the number of all idle servers in the computer cluster. An idle server is a server in which all computing resources within the server are in an idle state. Exemplarily, the computing resource may be a graphics processing unit (Graphics Processing Unit, GPU) card.

可选的,所述获取计算机集群的排队任务数量和空闲服务器数量,可以包括:按照预设开机时间间隔,定时获取计算机集群的排队任务数量和空闲服务器数量。Optionally, the acquiring the number of queued tasks and the number of idle servers of the computer cluster may include: periodically acquiring the number of queued tasks and the number of idle servers of the computer cluster according to a preset startup time interval.

预设开机时间间隔可以根据业务需求进行设置。示例性的,预设开机时间间隔可以为15分钟。每隔15分钟获取计算机集群的排队任务数量和空闲服务器数量;判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。由此,可以每隔预设开机时间间隔执行一次自动开机流程。The preset power-on time interval can be set according to business requirements. Exemplarily, the preset power-on time interval may be 15 minutes. Obtain the number of queued tasks and the number of idle servers of the computer cluster every 15 minutes; determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; if the number of queued tasks is greater than The preset number of tasks threshold, and the number of idle servers is less than the preset number of servers, obtain a target available server from the list of bootable servers of the computer cluster, and control the target idle server to perform a power-on operation; wherein, the bootable server The available servers in the list are the servers that were successfully shut down during the automatic shutdown process. Thus, the automatic power-on process can be executed every preset power-on time interval.

可选的,可以通过预设的用于获取排队任务数量和空闲服务器数量的脚本命令,获取计算机集群的排队任务数量和空闲服务器数量。Optionally, the number of queued tasks and the number of idle servers in the computer cluster can be acquired through a preset script command for acquiring the number of queued tasks and the number of idle servers.

步骤102、判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量:若是,则执行步骤103;若否,则执行步骤104。Step 102: Determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers: if yes, go to step 103; if not, go to step 104.

可选的,预设任务数量阈值的取值可以根据任务的频繁度进行确定。示例性的,预设任务数量阈值的取值为10。在实际的场景中,用户不合理地向服务器提交计算资源请求很容易导致任务排队。根据经验,排队任务数量在10及以下均可以认为正常。Optionally, the value of the preset task quantity threshold may be determined according to the frequency of tasks. Exemplarily, the value of the preset task quantity threshold is 10. In practical scenarios, users who submit computing resource requests to the server unreasonably can easily lead to task queuing. According to experience, the number of queued tasks is 10 and below can be considered normal.

判断排队任务数量是否大于预设任务数量阈值。如果排队任务数量大于预设任务数量阈值,表明排队任务数量大于正常值,当前计算任务申请资源比较紧张,需要自动开启服务器,保障计算任务的及时处理。如果排队任务数量小于等于预设任务数量阈值,表明排队任务数量小于等于正常值,当前计算任务申请资源不是很紧张,暂时不需要自动开启服务器,保障计算任务的及时处理。Determine whether the number of queued tasks is greater than the preset task number threshold. If the number of queued tasks is greater than the preset task number threshold, it indicates that the number of queued tasks is greater than the normal value, and the current computing task application resources are relatively tight, and the server needs to be automatically started to ensure the timely processing of computing tasks. If the number of queued tasks is less than or equal to the preset number of tasks threshold, it indicates that the number of queued tasks is less than or equal to the normal value, the current computing task application resources are not very tight, and the server does not need to be automatically started temporarily to ensure the timely processing of computing tasks.

可选的,预设服务器数量的取值可以根据可用机器数量来确定。示例性的,预设服务器数量的取值为2。在实际的场景中,一台服务器经常配置有8张GPU卡,当空闲服务器仅有1台而用户提交需要16张GPU卡的计算任务时,则该任务将处于排队状态。为此将预设服务器数量设置为2,则有助于及时对服务器进行开机,为该计算任务提供资源支持。Optionally, the value of the preset number of servers may be determined according to the number of available machines. Exemplarily, the value of the preset number of servers is 2. In actual scenarios, a server is often configured with 8 GPU cards. When there is only one idle server and a user submits a computing task that requires 16 GPU cards, the task will be queued. To this end, setting the preset number of servers to 2 helps to start the servers in time to provide resource support for the computing task.

判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量。如果排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量,表明排队任务数量大于正常值,可用的空闲服务器数量小于正常值,当前计算任务申请资源比较紧张,需要自动开启服务器,保障计算任务的及时处理。如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,表明当前计算任务申请资源不是很紧张,暂时不需要自动开启服务器,保障计算任务的及时处理。Determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers. If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, it indicates that the number of queued tasks is greater than the normal value, the number of available idle servers is less than the normal value, the current computing task application resources are relatively tight, and the server needs to be automatically started , to ensure the timely processing of computing tasks. If the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, it indicates that the current computing task application resources are not very tight, and the server does not need to be automatically started temporarily to ensure the timely processing of computing tasks.

同时根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以更合理地判断是否需要自动开启服务器。At the same time, according to the number of queued tasks and the number of idle servers, it is determined whether the current computing task application resources of the computer cluster are tense, and it can be more reasonably judged whether it is necessary to automatically start the server.

可选的,可以还包括:判断所述计算机集群的可开机服务器列表中是否存在可用服务器。Optionally, it may further include: judging whether there is an available server in the bootable server list of the computer cluster.

本实施例中,可以在判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量之前,判断所述计算机集群的可开机服务器列表中是否存在可用服务器。可选的,还可以在判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量之后,判断所述计算机集群的可开机服务器列表中是否存在可用服务器。In this embodiment, before judging whether the number of queued tasks is greater than the preset task number threshold and whether the number of idle servers is less than the preset number of servers, it can be judged whether there is an available server in the list of bootable servers of the computer cluster . Optionally, after judging whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers, it is also possible to judge whether there is an available server in the list of bootable servers of the computer cluster. .

计算机集群的可开机服务器列表内维护计算机集群内可开机的可用服务器。可用服务器即为可开机的服务器。可选的,可开机服务器列表内的可用服务器可以为在自动关机流程中被成功关机的服务器。The bootable server list of the computer cluster maintains the available servers that can be powered on in the computer cluster. An available server is a server that can be powered on. Optionally, the available servers in the bootable server list may be servers that are successfully shut down in the automatic shutdown process.

如果确定计算机集群的可开机服务器列表中不存在可用服务器,则意味着即便后续流程中需要自动开启服务器也无法达到目的,所以这种情况可以结束流程。如果确定计算机集群的可开机服务器列表中存在可用服务器,则意味着即便后续流程有成功开启服务器的可能,所以可以继续执行后续步骤。If it is determined that there is no available server in the list of bootable servers of the computer cluster, it means that even if the server needs to be automatically powered on in the subsequent process, the purpose cannot be achieved, and in this case, the process can be ended. If it is determined that there are available servers in the bootable server list of the computer cluster, it means that even if the subsequent process has the possibility of successfully booting the server, the subsequent steps can be continued.

步骤103、从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作。Step 103: Acquire a target available server from a list of bootable servers of the computer cluster, and control the target idle server to perform a booting operation.

其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.

可选的,自动关机流程可以为:获取所述计算机集群的空闲服务器数量;根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。Optionally, the automatic shutdown process may be: acquiring the number of idle servers in the computer cluster; judging whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers; if the computer cluster satisfies the idle server shutdown condition, Then, the target idle server is acquired from the idle servers of the computer cluster, the target idle server is controlled to perform a shutdown operation, and the target idle server that has been successfully shut down is added to the bootable server list of the computer cluster.

如果排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量,则从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作。If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform a boot operation.

可选的,从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作,可以包括:从计算机集群的可开机服务器列表中选择一台可用服务器作为目标可用服务器;通过智能平台管理接口(Intelligent PlatformManagement Interface,IPMI)指令,控制目标空闲服务器执行开机操作。Optionally, obtaining a target available server from a list of bootable servers of the computer cluster, and controlling the target idle server to perform a booting operation may include: selecting an available server from the list of bootable servers of the computer cluster as the target available server. Server; control the target idle server to perform a boot operation through an intelligent platform management interface (Intelligent Platform Management Interface, IPMI) instruction.

可选的,在控制所述目标空闲服务器执行开机操作之后,可以还包括:在等待预设开机时间段后,判断所述目标可用服务器是否开机成功;如果所述目标可用服务器开机成功,则对所述目标可用服务器进行初始化操作;在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功;如果所述目标可用服务器初始化成功,则将所述目标可用服务器的正常上线信息写入至日志文件中。Optionally, after controlling the target idle server to perform a power-on operation, it may further include: after waiting for a preset power-on time period, judging whether the target available server is successfully powered on; The target available server performs an initialization operation; after waiting for a preset initialization time period, it is determined whether the target available server is successfully initialized; if the target available server is successfully initialized, the normal online information of the target available server is written into to the log file.

预设开机时间段可以根据服务器开机操作需要的时间来确定。示例性的,通常服务器开机操作需要5分钟,预设开机时间段的取值为5分钟。预设初始化时间段可以根据服务器初始化操作需要的时间来确定。正常上线信息是用于记录目标可用服务器在当前自动开机流程中成功完成开机操作和初始化操作,正常上线的信息。The preset startup time period may be determined according to the time required for the server startup operation. Exemplarily, generally, it takes 5 minutes to start the server, and the value of the preset start-up time period is 5 minutes. The preset initialization time period may be determined according to the time required for the server initialization operation. The normal online information is used to record the information that the target available server successfully completes the startup operation and initialization operation in the current automatic startup process and goes online normally.

可选的,通过网络诊断工具(Packet Internet Groper,PING)指令对目标可用服务器进行测试,判断目标可用服务器是否开机成功。Optionally, a network diagnostic tool (Packet Internet Groper, PING) command is used to test the target available server to determine whether the target available server is successfully powered on.

可选的,如果目标可用服务器没有开机成功,则将目标可用服务器的开机未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的开机未成功信息,手动对目标可用服务器进行干预和维护。开机未成功信息是用于记录目标可用服务器在当前自动开机流程中没有开机成功的信息。Optionally, if the target available server has not been successfully powered on, the information about the unsuccessful booting of the target available server is written into the log file, so that the operation and maintenance personnel can check the log file regularly and determine whether the target available server has failed to boot according to the failure of the target server. information, manually intervene and maintain the target available servers. The boot failure information is used to record the information that the target available server fails to boot successfully in the current automatic boot process.

可选的,初始化操作可以包括内存交换分区(SWaP)检查,资源管理系统SLURM配置文件同步,显卡初始化,存储挂载情况检查,调度系统服务检查是否正常等操作。Optionally, the initialization operation may include memory swap partition (SWaP) checking, resource management system SLURM configuration file synchronization, graphics card initialization, storage mounting status checking, and scheduling system services checking whether they are normal or not.

可选的,如果目标可用服务器没有初始化成功,则将目标可用服务器的初始化未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的初始化未成功信息,手动对目标可用服务器进行干预和维护。初始化未成功信息是用于记录目标可用服务器在当前自动开机流程中没有初始化成功的信息。Optionally, if the target available server has not been initialized successfully, write the information about the unsuccessful initialization of the target available server into the log file, so that the operation and maintenance personnel can check the log file regularly and check if the initialization of the target available server is unsuccessful. information, manually intervene and maintain the target available servers. The initialization failure information is used to record the information that the target available server has not been successfully initialized in the current automatic startup process.

步骤104、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 104: Write the current resource situation information of the computer cluster into a log file.

本实施例中,如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,则将计算机集群的当前资源情况信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据计算机集群的当前资源情况信息确定计算机集群在当前自动开机流程中的资源情况。In this embodiment, if the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, the current resource situation information of the computer cluster is written into the log file, so that the operation and maintenance personnel can When checking the log file regularly, the resource situation of the computer cluster in the current automatic startup process is determined according to the current resource situation information of the computer cluster.

可选的,计算机集群的当前资源情况信息包括计算机集群的排队任务数量和空闲服务器数量。Optionally, the current resource situation information of the computer cluster includes the number of queued tasks and the number of idle servers of the computer cluster.

本发明实施例提供了一种服务器管理方法,通过获取计算机集群的排队任务数量和空闲服务器数量,然后判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。The embodiment of the present invention provides a server management method, by acquiring the number of queued tasks and the number of idle servers in a computer cluster, and then judging whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers; and When the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform the power-on operation. The number of idle servers is used to determine the tense situation of the current computing task application resources of the computer cluster. According to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and the server needs to be automatically started. The number of servers, so as to dynamically apply for resources according to the current computing tasks of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for the computing tasks, and avoid resources waste.

实施例二Embodiment 2

图2为本发明实施例二提供的一种服务器管理方法的流程图。本发明实施例可以与上述一个或者多个实施例中各个可选方案结合,在本发明实施例中,服务器管理方法可以还包括:判断所述计算机集群的可开机服务器列表中是否存在可用服务器。以及,在控制所述目标空闲服务器执行开机操作之后,可以还包括:在等待预设开机时间段后,判断所述目标可用服务器是否开机成功;如果所述目标可用服务器开机成功,则对所述目标可用服务器进行初始化操作;在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功;如果所述目标可用服务器初始化成功,则将所述目标可用服务器的正常上线信息写入至日志文件中。FIG. 2 is a flowchart of a server management method according to Embodiment 2 of the present invention. This embodiment of the present invention may be combined with each optional solution in one or more of the foregoing embodiments. In this embodiment of the present invention, the server management method may further include: judging whether an available server exists in the bootable server list of the computer cluster. And, after controlling the target idle server to perform a power-on operation, the method may further include: after waiting for a preset power-on time period, judging whether the target available server is successfully powered on; The target available server is initialized; after waiting for the preset initialization time period, it is determined whether the target available server is successfully initialized; if the target available server is successfully initialized, the normal online information of the target available server is written to the log in the file.

如图2所示,本发明实施例的方法具体包括:As shown in FIG. 2, the method of the embodiment of the present invention specifically includes:

步骤201、获取计算机集群的排队任务数量和空闲服务器数量。Step 201: Obtain the number of queued tasks and the number of idle servers of the computer cluster.

本实施例中未详尽的描述可以参考前述实施例。For details not described in this embodiment, reference may be made to the foregoing embodiments.

步骤202、判断所述计算机集群的可开机服务器列表中是否存在可用服务器:若是,则执行步骤203;若否,则结束流程。Step 202, judging whether there is an available server in the bootable server list of the computer cluster: if yes, go to Step 203; if not, end the process.

本实施例中,计算机集群的可开机服务器列表内维护计算机集群内可开机的可用服务器。可用服务器即为可开机的服务器。可选的,可开机服务器列表内的可用服务器可以为在自动关机流程中被成功关机的服务器。In this embodiment, the bootable servers in the computer cluster are maintained in the bootable server list of the computer cluster. An available server is a server that can be powered on. Optionally, the available servers in the bootable server list may be servers that are successfully shut down in the automatic shutdown process.

如果确定计算机集群的可开机服务器列表中不存在可用服务器,则意味着即便后续流程中需要自动开启服务器也无法达到目的,所以这种情况可以结束流程。如果确定计算机集群的可开机服务器列表中存在可用服务器,则意味着即便后续流程有成功开启服务器的可能,所以可以继续执行后续步骤。If it is determined that there is no available server in the list of bootable servers of the computer cluster, it means that even if the server needs to be automatically powered on in the subsequent process, the purpose cannot be achieved, and in this case, the process can be ended. If it is determined that there are available servers in the bootable server list of the computer cluster, it means that even if the subsequent process has the possibility of successfully booting the server, the subsequent steps can be continued.

步骤203、判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量:若是,则执行步骤204;若否,则执行步骤211。Step 203: Determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers: if yes, go to step 204; if not, go to step 211.

本实施例中,判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量。如果排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量,表明排队任务数量大于正常值,可用的空闲服务器数量小于正常值,当前计算任务申请资源比较紧张,需要自动开启服务器,保障计算任务的及时处理。如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,表明当前计算任务申请资源不是很紧张,暂时不需要自动开启服务器,保障计算任务的及时处理。In this embodiment, it is determined whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers. If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, it indicates that the number of queued tasks is greater than the normal value, the number of available idle servers is less than the normal value, the current computing task application resources are relatively tight, and the server needs to be automatically started , to ensure the timely processing of computing tasks. If the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, it indicates that the current computing task application resources are not very tight, and the server does not need to be automatically started temporarily to ensure the timely processing of computing tasks.

同时根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以更合理地判断是否需要自动开启服务器。At the same time, according to the number of queued tasks and the number of idle servers, it is determined whether the current computing task application resources of the computer cluster are tense, and it can be more reasonably judged whether it is necessary to automatically start the server.

步骤204、从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作。Step 204: Acquire a target available server from a list of bootable servers of the computer cluster, and control the target idle server to perform a booting operation.

可选的,从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作,可以包括:从计算机集群的可开机服务器列表中选择一台可用服务器作为目标可用服务器;通过IPMI指令,控制目标空闲服务器执行开机操作。Optionally, obtaining a target available server from a list of bootable servers of the computer cluster, and controlling the target idle server to perform a booting operation may include: selecting an available server from the list of bootable servers of the computer cluster as the target available server. Server; through the IPMI command, control the target idle server to perform the power-on operation.

步骤205、在等待预设开机时间段后,判断所述目标可用服务器是否开机成功:若是,则执行步骤206;若否,则执行步骤210。Step 205 , after waiting for a preset boot time period, determine whether the target available server is successfully booted: if yes, go to step 206 ; if not, go to step 210 .

本实施例中,预设开机时间段可以根据服务器开机操作需要的时间来确定。示例性的,通常服务器开机操作需要5分钟,预设开机时间段的取值为5分钟。In this embodiment, the preset startup time period may be determined according to the time required for the server startup operation. Exemplarily, generally, it takes 5 minutes to start the server, and the value of the preset start-up time period is 5 minutes.

可选的,通过PING指令对目标可用服务器进行测试,判断目标可用服务器是否开机成功。Optionally, use the PING command to test the target available server to determine whether the target available server is successfully powered on.

步骤206、对所述目标可用服务器进行初始化操作。Step 206 , perform an initialization operation on the target available server.

可选的,初始化操作可以包括内存交换分区(SWaP)检查,资源管理系统SLURM配置文件同步,显卡初始化,存储挂载情况检查,调度系统服务检查是否正常等操作。Optionally, the initialization operation may include memory swap partition (SWaP) checking, resource management system SLURM configuration file synchronization, graphics card initialization, storage mounting status checking, and scheduling system services checking whether they are normal or not.

步骤207、在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功:若是,则执行步骤208;若否,则执行步骤209。Step 207 , after waiting for a preset initialization time period, determine whether the target available server is successfully initialized: if yes, go to step 208 ; if not, go to step 209 .

本实施例中,预设初始化时间段可以根据服务器初始化操作需要的时间来确定。In this embodiment, the preset initialization time period may be determined according to the time required for the server initialization operation.

步骤208、将所述目标可用服务器的正常上线信息写入至日志文件中。Step 208: Write the normal online information of the target available server into a log file.

本实施例中,正常上线信息是用于记录目标可用服务器在当前自动开机流程中成功完成开机操作和初始化操作,正常上线的信息。In this embodiment, the normal online information is used to record the information that the target available server successfully completes the booting operation and the initialization operation in the current automatic booting process, and goes online normally.

步骤209、将所述目标可用服务器的初始化未成功信息写入至日志文件中。Step 209: Write the information about the unsuccessful initialization of the target available server into a log file.

本实施例中,如果目标可用服务器没有初始化成功,则将目标可用服务器的初始化未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的初始化未成功信息,手动对目标可用服务器进行干预和维护。初始化未成功信息是用于记录目标可用服务器在当前自动开机流程中没有初始化成功的信息。In this embodiment, if the target available server is not successfully initialized, the information about the unsuccessful initialization of the target available server is written into the log file, so that the operation and maintenance personnel can check the log file regularly, according to the initialization failure of the target available server. Success message, manually intervene and maintain the target available server. The initialization failure information is used to record the information that the target available server has not been successfully initialized in the current automatic startup process.

步骤210、将所述目标可用服务器的开机未成功信息写入至日志文件中。Step 210: Write the information about the unsuccessful startup of the target available server into a log file.

本实施例中,如果目标可用服务器没有开机成功,则将目标可用服务器的开机未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的开机未成功信息,手动对目标可用服务器进行干预和维护。开机未成功信息是用于记录目标可用服务器在当前自动开机流程中没有开机成功的信息。In this embodiment, if the target available server has not been successfully powered on, the information about the unsuccessful booting of the target available server is written into the log file, so that the operation and maintenance personnel can check the log file regularly, according to the boot failure of the target available server. Success message, manually intervene and maintain the target available server. The boot failure information is used to record the information that the target available server fails to boot successfully in the current automatic boot process.

步骤211、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 211: Write the current resource situation information of the computer cluster into a log file.

本实施例中,如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,则将计算机集群的当前资源情况信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据计算机集群的当前资源情况信息确定计算机集群在当前自动开机流程中的资源情况。In this embodiment, if the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, the current resource situation information of the computer cluster is written into the log file, so that the operation and maintenance personnel can When checking the log file regularly, the resource situation of the computer cluster in the current automatic startup process is determined according to the current resource situation information of the computer cluster.

本发明实施例提供了一种服务器管理方法,通过获取计算机集群的排队任务数量和空闲服务器数量,然后在确定计算机集群的可开机服务器列表中存在可用服务器时,判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量,并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,则从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,根据目标可用服务器的开机情况和初始化情况,写入相应的信息至日志文件中,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费,可以根据服务器的开机情况和初始化情况,写入相应的信息至日志文件中,以使运维人员可以在定期查看日志文件时,根据日志文件中的信息,手动对目标可用服务器进行干预和维护。The embodiment of the present invention provides a server management method, by acquiring the number of queued tasks and the number of idle servers in a computer cluster, and then determining whether the number of queued tasks is greater than a preset task when it is determined that there are available servers in the list of bootable servers of the computer cluster Quantity threshold, and whether the number of idle servers is less than the preset number of servers, and when the number of queued tasks is greater than the preset number of tasks threshold, and the number of idle servers is less than the preset number of servers, the target is available from the list of bootable servers in the computer cluster. The server controls the target idle server to perform the power-on operation, and writes the corresponding information to the log file according to the start-up and initialization status of the target available server, and can determine the current computing task application resources of the computer cluster according to the number of queued tasks and the number of idle servers. In the tense situation, according to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and when the server needs to be automatically started, an appropriate number of servers can be dynamically opened, so as to dynamically In case of resource shortage for computing tasks, automatically start the server to ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for computing tasks, and avoid wasting resources. According to the startup status and initialization status of the server, write Enter the corresponding information into the log file, so that the operation and maintenance personnel can manually intervene and maintain the target available server according to the information in the log file when viewing the log file regularly.

实施例三Embodiment 3

图3为本发明实施例三提供的一种服务器管理方法的流程图。本发明实施例可以与上述一个或者多个实施例中各个可选方案结合,在本发明实施例中,服务器管理方法可以还包括:获取所述计算机集群的空闲服务器数量;根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。FIG. 3 is a flowchart of a server management method according to Embodiment 3 of the present invention. This embodiment of the present invention may be combined with each of the optional solutions in one or more of the foregoing embodiments. In this embodiment of the present invention, the server management method may further include: acquiring the number of idle servers in the computer cluster; according to the number of idle servers , determine whether the computer cluster satisfies the idle server shutdown condition; if the computer cluster satisfies the idle server shutdown condition, obtain the target idle server from the idle servers of the computer cluster, control the target idle server to perform the shutdown operation, and The target idle server that has been successfully shut down is added to the list of bootable servers of the computer cluster.

如图3所示,本发明实施例的方法具体包括:As shown in FIG. 3 , the method of the embodiment of the present invention specifically includes:

步骤301、获取计算机集群的空闲服务器数量。Step 301: Obtain the number of idle servers in the computer cluster.

本实施例中,空闲服务器数量是计算机集群内所有空闲服务器的数量。空闲服务器是服务器内的全部计算资源处于空闲状态的服务器。示例性的,计算资源可以为GPU卡。In this embodiment, the number of idle servers is the number of all idle servers in the computer cluster. An idle server is a server in which all computing resources within the server are in an idle state. Exemplarily, the computing resource may be a GPU card.

可选的,所述获取计算机集群的空闲服务器数量,可以包括:按照预设关闭时间间隔,定时获取计算机集群的空闲服务器数量。Optionally, the acquiring the number of idle servers in the computer cluster may include: periodically acquiring the number of idle servers in the computer cluster according to a preset shutdown time interval.

预设关闭时间间隔可以根据业务需求进行设置。示例性的,预设关闭时间间隔可以为一天。每隔一天获取计算机集群的空闲服务器数量;根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。由此,可以每隔预设关闭时间间隔执行一次自动关机流程。The preset shutdown time interval can be set according to business needs. Exemplarily, the preset turn-off time interval may be one day. Obtain the number of idle servers in the computer cluster every other day; according to the number of idle servers, determine whether the computer cluster satisfies the idle server shutdown condition; if the computer cluster meets the idle server shutdown condition, then select the idle server from the computer cluster The target idle server is obtained from the , the target idle server is controlled to perform a shutdown operation, and the target idle server that has been successfully shut down is added to the list of bootable servers of the computer cluster. Thus, the automatic shutdown process can be performed every preset shutdown time interval.

可选的,因为不宜频繁执行关机流程,预设关闭时间间隔长于前文所述的预设开机时间间隔。Optionally, because the shutdown process should not be performed frequently, the preset shutdown time interval is longer than the preset startup time interval described above.

根据经验,每天中凌晨附近时间段内用户提交的计算任务通常最少,计算机集群的利用率最低,此时触发自动关机流程往往最合适。所以可以按照预设关闭时间间隔,在每天凌晨获取计算机集群的空闲服务器数量。具体的,可以通过判断系统当前时间是否跨天来作为自动关机流程执行的触发条件,实现按照预设关闭时间间隔,在每天凌晨执行一次自动关机流程。According to experience, the computing tasks submitted by users are usually the least and the utilization rate of the computer cluster is the lowest in the time period around the middle and early morning of each day. At this time, it is often the most appropriate to trigger the automatic shutdown process. Therefore, the number of idle servers in the computer cluster can be obtained in the early morning of each day according to the preset shutdown time interval. Specifically, by judging whether the current time of the system spans days as a trigger condition for the execution of the automatic shutdown process, the automatic shutdown process can be executed once every morning in the early morning according to the preset shutdown time interval.

可选的,可以通过预设的用于获取空闲服务器数量的脚本命令,获取计算机集群的空闲服务器数量。Optionally, the number of idle servers in the computer cluster may be acquired through a preset script command for acquiring the number of idle servers.

步骤302、根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件:若是,则执行步骤303;若否,则执行步骤304。Step 302 , according to the number of idle servers, determine whether the computer cluster satisfies the idle server shutdown condition: if yes, go to step 303 ; if not, go to step 304 .

可选的,所述根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件,可以包括:判断所述空闲服务器数量是否大于预设空闲服务器数量阈值;若是,则确定所述计算机集群满足空闲服务器关闭条件;若否,则确定所述计算机集群不满足空闲服务器关闭条件。Optionally, judging whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers may include: judging whether the number of idle servers is greater than a preset number of idle servers threshold; if so, determining the computer cluster The cluster satisfies the idle server shutdown condition; if not, it is determined that the computer cluster does not meet the idle server shutdown condition.

预设空闲服务器数量阈值的取值可以根据业务需求进行设置。示例性的,预设空闲服务器数量阈值的取值为5。The value of the preset number of idle servers threshold can be set according to business requirements. Exemplarily, the preset threshold for the number of idle servers is 5.

判断空闲服务器数量是否大于预设空闲服务器数量阈值。如果空闲服务器数量大于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量大于正常值,计算机集群中存在过多的空闲服务器维持开启状态,需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群满足空闲服务器关闭条件。如果空闲服务器数量小于等于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量小于等于正常值,计算机集群中不存在过多的空闲服务器维持开启状态,暂时不需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群不满足空闲服务器关闭条件。Determine whether the number of idle servers is greater than the preset number of idle servers threshold. If the number of idle servers is greater than the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is greater than the normal value, and there are too many idle servers in the computer cluster that remain open. Then it is determined that the computer cluster satisfies the idle server shutdown condition. If the number of idle servers is less than or equal to the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is less than or equal to the normal value, and there are not too many idle servers in the computer cluster to remain open, and there is no need to shut down too many idle servers temporarily. , to avoid resource waste, it is determined that the computer cluster does not meet the idle server shutdown condition.

步骤303、从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。Step 303: Acquire a target idle server from idle servers of the computer cluster, control the target idle server to perform a shutdown operation, and add the target idle server that has been successfully shut down to a list of bootable servers of the computer cluster.

可选的,所述从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,可以包括:计算所述空闲服务器数量与所述预设空闲服务器数量阈值的差值;从所述计算机集群的空闲服务器中获取所述差值数量的空闲服务器,作为目标空闲服务器,将所述目标空闲服务器从资源池中剔除;对剔除后的所述目标空闲服务器执行关机操作。Optionally, obtaining a target idle server from idle servers in the computer cluster, and controlling the target idle server to perform a shutdown operation, may include: calculating the difference between the number of idle servers and the preset threshold for the number of idle servers. obtain the idle servers with the difference in the number of idle servers from the idle servers of the computer cluster as target idle servers, and remove the target idle servers from the resource pool; perform a shutdown operation on the removed target idle servers .

资源池中包括计算机集群中维持在开启状态的服务器。本实施例中,将需要执行关机操作的目标空闲服务器及时从资源池中剔除。A resource pool includes servers in a computer cluster that are maintained in an on state. In this embodiment, the target idle server that needs to perform the shutdown operation is removed from the resource pool in time.

可选的,通过对剔除后的目标空闲服务器执行关机函数,完成对剔除后的目标空闲服务器的关机操作。Optionally, by executing a shutdown function on the eliminated target idle server, the shutdown operation of the eliminated target idle server is completed.

在一个具体实例中,预设空闲服务器数量阈值的取值为5。空闲服务器数量为7。空闲服务器数量大于5,表明计算机集群中的空闲服务器的数量大于正常值,计算机集群中存在过多的空闲服务器维持开启状态,需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群满足空闲服务器关闭条件。空闲服务器数量与预设空闲服务器数量阈值的差值为2。从计算机集群的空闲服务器中获取2台空闲服务器,作为目标空闲服务器。将目标空闲服务器从资源池中剔除。对剔除后的目标空闲服务器执行关机操作。In a specific example, the value of the preset threshold for the number of idle servers is 5. The number of idle servers is 7. If the number of idle servers is greater than 5, it indicates that the number of idle servers in the computer cluster is greater than the normal value, and there are too many idle servers in the computer cluster to maintain the open state. It is necessary to close too many idle servers to avoid wasting resources, and then determine that the computer cluster satisfies the idle state. Server shutdown condition. The difference between the number of idle servers and the preset number of idle servers is 2. Obtain 2 idle servers from idle servers in the computer cluster as target idle servers. Remove the target idle server from the resource pool. Perform a shutdown operation on the target idle server after culling.

计算机集群的可开机服务器列表内维护计算机集群内可开机的可用服务器。可用服务器即为可开机的服务器。关机成功的目标空闲服务器为计算机集群内可开机的可用服务器。由此,将在自动关机流程中被成功关机的目标空闲服务器添加至计算机集群的可开机服务器列表中。The bootable server list of the computer cluster maintains the available servers that can be powered on in the computer cluster. An available server is a server that can be powered on. The target idle server that is successfully shut down is an available server that can be powered on in the computer cluster. Thus, the target idle server that is successfully shut down in the automatic shutdown process is added to the list of bootable servers of the computer cluster.

步骤304、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 304: Write the current resource situation information of the computer cluster into a log file.

本实施例中,如果计算机集群不满足空闲服务器关闭条件,则将计算机集群的当前资源情况信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据计算机集群的当前资源情况信息确定计算机集群在当前自动关机流程中的资源情况。In this embodiment, if the computer cluster does not meet the idle server shutdown condition, the current resource situation information of the computer cluster is written into the log file, so that the operation and maintenance personnel can check the log file regularly, according to the current resources of the computer cluster. The situation information determines the resource situation of the computer cluster in the current automatic shutdown process.

可选的,计算机集群的当前资源情况信息包括计算机集群的空闲服务器数量。Optionally, the current resource situation information of the computer cluster includes the number of idle servers in the computer cluster.

本发明实施例提供了一种服务器管理方法,通过获取计算机集群的空闲服务器数量,然后根据空闲服务器数量,判断计算机集群是否满足空闲服务器关闭条件,在计算机集群满足空闲服务器关闭条件时,从计算机集群的空闲服务器中获取目标空闲服务器,控制目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至计算机集群的可开机服务器列表中,可以根据空闲服务器数量,动态地关闭过多空闲的服务器,可以实现动态地根据计算机集群内服务器的空闲情况,自动关闭服务器,节省计算机集群的功耗,实现整个计算机集群的功耗维持在与计算任务相适应的程度,避免资源浪费。An embodiment of the present invention provides a server management method, by acquiring the number of idle servers in a computer cluster, and then judging whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers, and when the computer cluster meets the idle server shutdown condition, from the computer cluster Obtain the target idle server from the idle server, control the target idle server to perform the shutdown operation, and add the target idle server that has been successfully shut down to the list of bootable servers of the computer cluster, and can dynamically shut down too many idle servers according to the number of idle servers. , which can automatically shut down the server dynamically according to the idle condition of the server in the computer cluster, save the power consumption of the computer cluster, maintain the power consumption of the entire computer cluster at a level suitable for the computing task, and avoid the waste of resources.

实施例四Embodiment 4

图4为本发明实施例四提供的一种服务器管理方法的流程图。本发明实施例可以与上述一个或者多个实施例中各个可选方案结合,在本发明实施例中,所述根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件,可以包括:判断所述空闲服务器数量是否大于预设空闲服务器数量阈值;若是,则确定所述计算机集群满足空闲服务器关闭条件;若否,则确定所述计算机集群不满足空闲服务器关闭条件。FIG. 4 is a flowchart of a server management method according to Embodiment 4 of the present invention. This embodiment of the present invention may be combined with each of the optional solutions in one or more of the foregoing embodiments. In this embodiment of the present invention, determining whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers may include: : determine whether the number of idle servers is greater than the preset number of idle servers threshold; if so, determine that the computer cluster satisfies the idle server shutdown condition; if not, determine that the computer cluster does not meet the idle server shutdown condition.

以及,所述从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,可以包括:计算所述空闲服务器数量与所述预设空闲服务器数量阈值的差值;从所述计算机集群的空闲服务器中获取所述差值数量的空闲服务器,作为目标空闲服务器,将所述目标空闲服务器从资源池中剔除;对剔除后的所述目标空闲服务器执行关机操作。And, obtaining a target idle server from idle servers in the computer cluster, and controlling the target idle server to perform a shutdown operation, may include: calculating a difference between the number of idle servers and the preset threshold for the number of idle servers; The idle servers with the difference in the number of idle servers are obtained from idle servers in the computer cluster as target idle servers, and the target idle servers are eliminated from the resource pool; a shutdown operation is performed on the eliminated target idle servers.

如图4所示,本发明实施例的方法具体包括:As shown in FIG. 4 , the method of the embodiment of the present invention specifically includes:

步骤401、获取计算机集群的空闲服务器数量。Step 401: Obtain the number of idle servers in the computer cluster.

本实施例中未详尽的描述可以参考前述实施例。For details not described in this embodiment, reference may be made to the foregoing embodiments.

步骤402、判断所述空闲服务器数量是否大于预设空闲服务器数量阈值:若是,则执行步骤403;若否,则执行步骤406。Step 402 , determine whether the number of idle servers is greater than the preset number of idle servers threshold: if yes, go to step 403 ; if not, go to step 406 .

本实施例中,预设空闲服务器数量阈值的取值可以根据业务需求进行设置。示例性的,预设空闲服务器数量阈值的取值为5。In this embodiment, the value of the preset threshold for the number of idle servers may be set according to business requirements. Exemplarily, the preset threshold for the number of idle servers is 5.

如果空闲服务器数量大于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量大于正常值,计算机集群中存在过多的空闲服务器维持开启状态,需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群满足空闲服务器关闭条件。如果空闲服务器数量小于等于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量小于等于正常值,计算机集群中不存在过多的空闲服务器维持开启状态,暂时不需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群不满足空闲服务器关闭条件。If the number of idle servers is greater than the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is greater than the normal value, and there are too many idle servers in the computer cluster that remain open. Then it is determined that the computer cluster satisfies the idle server shutdown condition. If the number of idle servers is less than or equal to the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is less than or equal to the normal value, and there are not too many idle servers in the computer cluster to remain open, and there is no need to shut down too many idle servers temporarily. , to avoid resource waste, it is determined that the computer cluster does not meet the idle server shutdown condition.

步骤403、计算所述空闲服务器数量与所述预设空闲服务器数量阈值的差值。Step 403: Calculate the difference between the number of idle servers and the preset threshold for the number of idle servers.

步骤404、从所述计算机集群的空闲服务器中获取所述差值数量的空闲服务器,作为目标空闲服务器,将所述目标空闲服务器从资源池中剔除。Step 404: Obtain the idle servers of the difference number from idle servers of the computer cluster as target idle servers, and remove the target idle servers from the resource pool.

步骤405、对剔除后的所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。Step 405 : Perform a shutdown operation on the eliminated target idle server, and add the target idle server whose shutdown is successful to the list of bootable servers of the computer cluster.

可选的,通过对剔除后的目标空闲服务器执行关机函数,完成对剔除后的目标空闲服务器的关机操作。Optionally, by executing a shutdown function on the eliminated target idle server, the shutdown operation of the eliminated target idle server is completed.

步骤406、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 406: Write the current resource situation information of the computer cluster into a log file.

本发明实施例提供了一种服务器管理方法,通过判断空闲服务器数量是否大于预设空闲服务器数量阈值,确定计算机集群是否满足空闲服务器关闭条件,在空闲服务器数量大于预设空闲服务器数量阈值时,确定计算机集群满足空闲服务器关闭条件,然后计算空闲服务器数量与预设空闲服务器数量阈值的差值,从计算机集群的空闲服务器中获取差值数量的空闲服务器,作为目标空闲服务器,将目标空闲服务器从资源池中剔除,对剔除后的目标空闲服务器执行关机操作,可以根据空闲服务器数量和预设空闲服务器数量阈值,动态地关闭过多空闲的服务器,可以实现动态地根据计算机集群内服务器的空闲情况,自动关闭服务器,节省计算机集群的功耗,实现整个计算机集群的功耗维持在与计算任务相适应的程度,避免资源浪费。An embodiment of the present invention provides a server management method, by judging whether the number of idle servers is greater than a preset number of idle servers threshold, to determine whether a computer cluster satisfies the idle server shutdown condition, and when the number of idle servers is greater than the preset number of idle servers Threshold, determine The computer cluster satisfies the idle server shutdown condition, and then calculates the difference between the number of idle servers and the preset number of idle servers, and obtains the idle servers with the difference from the idle servers of the computer cluster as the target idle server. Eliminate from the pool, perform the shutdown operation on the target idle server after the elimination, and dynamically shut down too many idle servers according to the number of idle servers and the preset number of idle servers. The server is automatically shut down, the power consumption of the computer cluster is saved, the power consumption of the entire computer cluster is maintained at a level suitable for the computing task, and the waste of resources is avoided.

实施例五Embodiment 5

图5为本发明实施例五提供的一种服务器管理装置的结构示意图。如图5所示,所述装置包括:数量获取模块501、数量判断模块502以及服务器开机模块503。FIG. 5 is a schematic structural diagram of a server management apparatus according to Embodiment 5 of the present invention. As shown in FIG. 5 , the apparatus includes: a quantity acquiring module 501 , a quantity judging module 502 and a server booting module 503 .

其中,数量获取模块501,用于获取计算机集群的排队任务数量和空闲服务器数量;数量判断模块502,用于判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;服务器开机模块503,用于如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。Wherein, the quantity acquisition module 501 is used to acquire the number of queued tasks and the number of idle servers in the computer cluster; the quantity judgment module 502 is used to determine whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than The preset number of servers; the server startup module 503 is configured to obtain from the list of bootable servers of the computer cluster if the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers The target available server controls the target idle server to perform a power-on operation; wherein, the available servers in the list of bootable servers are servers that are successfully shut down in the automatic shutdown process.

本发明实施例提供了一种服务器管理装置,通过获取计算机集群的排队任务数量和空闲服务器数量,然后判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。An embodiment of the present invention provides a server management device, by acquiring the number of queued tasks and the number of idle servers in a computer cluster, and then determining whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers; and When the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform the power-on operation. The number of idle servers is used to determine the tense situation of the current computing task application resources of the computer cluster. According to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and the server needs to be automatically started. The number of servers, so as to dynamically apply for resources according to the current computing tasks of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for the computing tasks, and avoid resources waste.

在本发明实施例的一个可选实施方式中,可选的,服务器管理装置可以还包括:服务器判断模块,用于判断所述计算机集群的可开机服务器列表中是否存在可用服务器。In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: a server judgment module, configured to judge whether there is an available server in the bootable server list of the computer cluster.

在本发明实施例的一个可选实施方式中,可选的,数量获取模块501可以包括:数量定时获取单元,用于按照预设开机时间间隔,定时获取计算机集群的排队任务数量和空闲服务器数量。In an optional implementation of the embodiment of the present invention, optionally, the quantity acquisition module 501 may include: a quantity timing acquisition unit, configured to regularly acquire the number of queued tasks and the number of idle servers in the computer cluster according to a preset boot time interval .

在本发明实施例的一个可选实施方式中,可选的,服务器管理装置可以还包括:开机判断模块,用于在等待预设开机时间段后,判断所述目标可用服务器是否开机成功;服务器初始化模块,用于如果所述目标可用服务器开机成功,则对所述目标可用服务器进行初始化操作;初始化判断模块,用于在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功;信息写入模块,用于如果所述目标可用服务器初始化成功,则将所述目标可用服务器的正常上线信息写入至日志文件中。In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: a power-on judgment module, configured to judge whether the target available server is successfully powered on after waiting for a preset power-on time period; the server an initialization module, configured to perform an initialization operation on the target available server if the target available server is successfully powered on; an initialization judgment module, configured to determine whether the target available server is successfully initialized after waiting for a preset initialization time period; An information writing module, configured to write the normal online information of the target available server into a log file if the target available server is successfully initialized.

在本发明实施例的一个可选实施方式中,可选的,服务器管理装置可以还包括:空闲数量获取模块,用于获取所述计算机集群的空闲服务器数量;关闭条件判断模块,用于根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;服务器关机模块,用于如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: an idle quantity acquisition module, configured to acquire the idle server quantity of the computer cluster; a shutdown condition judgment module, configured to the number of idle servers, to determine whether the computer cluster satisfies the idle server shutdown condition; the server shutdown module is configured to obtain the target idle server from the idle servers of the computer cluster if the computer cluster meets the idle server shutdown condition, and control the The target idle server performs a shutdown operation, and the target idle server that is successfully shut down is added to the list of bootable servers of the computer cluster.

关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

上述服务器管理装置可执行本发明任意实施例所提供的服务器管理方法,具备执行服务器管理方法相应的功能模块和有益效果。The above-mentioned server management apparatus can execute the server management method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the server management method.

实施例六Embodiment 6

图6为本发明实施例六提供的一种计算机设备的结构示意图。图6示出了适于用来实现本发明实施方式的示例性计算机设备12的框图。图6显示的计算机设备12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。FIG. 6 is a schematic structural diagram of a computer device according to Embodiment 6 of the present invention. Figure 6 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present invention.

如图6所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器16,存储器28,连接不同系统组件(包括存储器28和处理器16)的总线18。As shown in FIG. 6, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors 16 , memory 28 , and bus 18 connecting various system components including memory 28 and processor 16 .

总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. By way of example, these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.

计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.

存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图6未显示,通常称为“硬盘驱动器”)。尽管图6中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. For example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in Figure 6, disk drives for reading and writing to removable non-volatile magnetic disks (eg "floppy disks") and removable non-volatile optical disks (eg CD-ROM, DVD-ROM) may be provided or other optical media) to read and write optical drives. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.

具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42, which may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the described embodiments of the present invention.

计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图6中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with computer device 12, and/or communicate with Any device (eg, network card, modem, etc.) that enables the computer device 12 to communicate with one or more other computing devices. Such communication may take place through input/output (I/O) interface 22 . Also, the computer device 12 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through a network adapter 20 . As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be understood that, although not shown in FIG. 6, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tapes drives and data backup storage systems.

处理器16通过运行存储在存储器28中的程序,从而执行各种功能应用以及数据处理,实现本发明实施例所提供的服务器管理方法:获取计算机集群的排队任务数量和空闲服务器数量;判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The processor 16 executes various functional applications and data processing by running the programs stored in the memory 28, thereby realizing the server management method provided by the embodiment of the present invention: obtaining the number of queued tasks and the number of idle servers in the computer cluster; Whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; if the number of queued tasks is greater than the preset number of tasks threshold, and the number of idle servers is less than the preset number of servers, then A target available server is obtained from the bootable server list of the computer cluster, and the target idle server is controlled to perform a booting operation; wherein, the available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.

实施例七Embodiment 7

本发明实施例七提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,实现本发明实施例所提供的服务器管理方法:获取计算机集群的排队任务数量和空闲服务器数量;判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。Embodiment 7 of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the server management method provided by the embodiment of the present invention: obtaining the number of queued tasks of a computer cluster and The number of idle servers; determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; if the number of queued tasks is greater than the preset task number threshold, and the number of idle servers If the number of servers is less than the preset number, the target available server is obtained from the list of bootable servers of the computer cluster, and the target idle server is controlled to perform the booting operation; wherein, the available servers in the bootable server list are those that were automatically shut down in the automatic shutdown process. A server that was successfully shut down.

可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Any combination of one or more computer-readable media may be employed. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .

计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或计算机设备上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional Procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or computer device. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to connect).

注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.

Claims (8)

1. A server management method, comprising:
acquiring the number of queuing tasks and the number of idle servers of a computer cluster;
judging whether the number of the queued tasks is larger than a preset task number threshold value or not and whether the number of the idle servers is smaller than the preset number of the servers or not;
if the number of the queued tasks is larger than a preset task number threshold value and the number of the idle servers is smaller than the preset server number, acquiring a target available server from a bootable server list of the computer cluster, and controlling the target idle server to execute booting operation;
the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process.
2. The method of claim 1, further comprising:
and judging whether available servers exist in the bootable server list of the computer cluster.
3. The method of claim 1, wherein obtaining the number of queued tasks and the number of free servers for the cluster of computers comprises:
and acquiring the number of queued tasks and the number of idle servers of the computer cluster at regular time according to a preset startup time interval.
4. The method of claim 1, further comprising, after controlling the target idle server to perform a boot operation:
after waiting for a preset startup time period, judging whether the target available server is successfully started;
if the target available server is successfully started, initializing the target available server;
after waiting for a preset initialization time period, judging whether the target available server is initialized successfully;
and if the initialization of the target available server is successful, writing the normal online information of the target available server into a log file.
5. The method of claim 1, further comprising:
acquiring the number of idle servers of the computer cluster;
judging whether the computer cluster meets the idle server closing condition or not according to the number of the idle servers;
and if the computer cluster meets the idle server closing condition, acquiring a target idle server from the idle servers of the computer cluster, controlling the target idle server to execute shutdown operation, and adding the target idle server which is successfully shutdown into a bootable server list of the computer cluster.
6. A server management apparatus, comprising:
the quantity acquisition module is used for acquiring the quantity of queuing tasks and the quantity of idle servers of the computer cluster;
the quantity judging module is used for judging whether the number of the queued tasks is greater than a preset task quantity threshold value or not and whether the number of the idle servers is less than the preset number of the servers or not;
a server starting module, configured to, if the number of queued tasks is greater than a preset task number threshold and the number of idle servers is less than a preset server number, obtain a target available server from a bootable server list of the computer cluster, and control the target idle server to perform a starting operation;
the available servers in the bootable server list are servers which are successfully powered off in the automatic power-off process.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the server management method according to any of claims 1-5 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the server management method according to any one of claims 1 to 5.
CN202010760328.9A 2020-07-31 2020-07-31 Server management method, device, equipment and storage medium Pending CN111930502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010760328.9A CN111930502A (en) 2020-07-31 2020-07-31 Server management method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010760328.9A CN111930502A (en) 2020-07-31 2020-07-31 Server management method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111930502A true CN111930502A (en) 2020-11-13

Family

ID=73315098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010760328.9A Pending CN111930502A (en) 2020-07-31 2020-07-31 Server management method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111930502A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668882A (en) * 2020-12-29 2021-04-16 浙江科钛机器人股份有限公司 Autonomous survival detection and distributed coordination method for mobile robot cluster
CN114443297A (en) * 2022-01-21 2022-05-06 北京金山云网络技术有限公司 Computing task processing method, device, storage medium and electronic device
CN119030970A (en) * 2024-10-28 2024-11-26 成都掠食鸟科技有限公司 A system for remote file transfer and storage via remote equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2292040A1 (en) * 1999-03-25 2000-09-25 International Business Machines Corporation Interface system and method for asynchronously updating a shared resource
JP2008047096A (en) * 2006-08-14 2008-02-28 Fuji Xerox Co Ltd Computer system, method, and program for queuing
CN103645956A (en) * 2013-12-18 2014-03-19 浪潮电子信息产业股份有限公司 Intelligent cluster load management method
CN110764892A (en) * 2019-10-22 2020-02-07 北京字节跳动网络技术有限公司 Task processing method, device and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2292040A1 (en) * 1999-03-25 2000-09-25 International Business Machines Corporation Interface system and method for asynchronously updating a shared resource
JP2008047096A (en) * 2006-08-14 2008-02-28 Fuji Xerox Co Ltd Computer system, method, and program for queuing
CN103645956A (en) * 2013-12-18 2014-03-19 浪潮电子信息产业股份有限公司 Intelligent cluster load management method
CN110764892A (en) * 2019-10-22 2020-02-07 北京字节跳动网络技术有限公司 Task processing method, device and computer readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668882A (en) * 2020-12-29 2021-04-16 浙江科钛机器人股份有限公司 Autonomous survival detection and distributed coordination method for mobile robot cluster
CN112668882B (en) * 2020-12-29 2024-04-16 浙江科钛机器人股份有限公司 Mobile robot cluster autonomous survival detection and distributed coordination method
CN114443297A (en) * 2022-01-21 2022-05-06 北京金山云网络技术有限公司 Computing task processing method, device, storage medium and electronic device
CN119030970A (en) * 2024-10-28 2024-11-26 成都掠食鸟科技有限公司 A system for remote file transfer and storage via remote equipment

Similar Documents

Publication Publication Date Title
KR101512252B1 (en) Method of provisioning firmware in an operating system (os) absent services environment
US8296553B2 (en) Method and system to enable fast platform restart
US9600294B2 (en) Port throttling across an operating system restart during a hot upgrade
US7584374B2 (en) Driver/variable cache and batch reading system and method for fast resume
US20120036383A1 (en) Power supply for networked host computers and control method thereof
US20100079472A1 (en) Method and systems to display platform graphics during operating system initialization
US10860363B1 (en) Managing virtual machine hibernation state incompatibility with underlying host configurations
CN105765534A (en) Virtual computing systems and methods
CN111930502A (en) Server management method, device, equipment and storage medium
US8972964B2 (en) Dynamic firmware updating system for use in translated computing environments
US10649832B2 (en) Technologies for headless server manageability and autonomous logging
WO2025118803A1 (en) Server operation starting method and device, server, and storage medium
CN110851384B (en) Interrupt processing method, system and computer readable storage medium
US10996942B1 (en) System and method for graphics processing unit firmware updates
US11516082B1 (en) Configuration of a baseboard management controller (BMC) group leader responsive to load
US10394619B2 (en) Signature-based service manager with dependency checking
US9430265B1 (en) System and method for handling I/O timeout deadlines in virtualized systems
CN110502267A (en) Update method, device, equipment and the storage medium of appliance applications
US8060605B1 (en) Systems and methods for evaluating the performance of remote computing systems
CN111741130A (en) Server management method, device, equipment and storage medium
US12367056B2 (en) Reliable device assignment for virtual machine based containers
US10104619B2 (en) Retrieval of a command from a management server
US20230359533A1 (en) User Triggered Virtual Machine Cloning for Recovery/Availability/Scaling
CN115509590B (en) Continuous deployment method and computer equipment
EP3326062B1 (en) Mitigation of the impact of intermittent unavailability of remote storage on virtual machines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201113