CN111930502A - Server management method, device, equipment and storage medium - Google Patents
Server management method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111930502A CN111930502A CN202010760328.9A CN202010760328A CN111930502A CN 111930502 A CN111930502 A CN 111930502A CN 202010760328 A CN202010760328 A CN 202010760328A CN 111930502 A CN111930502 A CN 111930502A
- Authority
- CN
- China
- Prior art keywords
- server
- servers
- idle
- computer cluster
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 abstract description 15
- 239000002699 waste material Substances 0.000 abstract description 9
- 238000012423 maintenance Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 108010028984 3-isopropylmalate dehydratase Proteins 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Power Sources (AREA)
Abstract
本发明实施例公开了一种服务器管理方法、装置、设备及存储介质。其中,方法包括:获取计算机集群的排队任务数量和空闲服务器数量;判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;若是,则从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作;可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。本发明实施例可以实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。
Embodiments of the present invention disclose a server management method, apparatus, device and storage medium. The method includes: acquiring the number of queued tasks and the number of idle servers in the computer cluster; judging whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers; The target available server is obtained from the list, and the target idle server is controlled to perform the boot operation; the available servers in the bootable server list are the servers that were successfully shut down in the automatic shutdown process. The embodiment of the present invention can dynamically apply for resources according to the current computing task of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for computing tasks, and avoid resource waste.
Description
技术领域technical field
本发明实施例涉及计算机技术领域,尤其涉及一种服务器管理方法、装置、设备及存储介质。Embodiments of the present invention relate to the field of computer technologies, and in particular, to a server management method, apparatus, device, and storage medium.
背景技术Background technique
在计算机集群中,通常有多个服务器组成计算资源。计算机集群采用将计算任务分配到集群的不同服务器的方式提高计算能力。In a computer cluster, there are usually multiple servers that make up the computing resources. Computer clusters increase computing power by distributing computing tasks to different servers in the cluster.
相关技术中,通常对计算机集群中的服务器统一地进行开启和关闭。所有服务器在开启后,维持在开启状态。如果服务器获取到分配的计算任务,则执行相应的计算操作。如果服务器没有获取计算任务,则维持开启状态,等待分配任务。In the related art, servers in a computer cluster are generally turned on and off uniformly. After all servers are turned on, they remain on. If the server obtains the assigned computing task, it executes the corresponding computing operation. If the server does not obtain a computing task, it remains on and waits for a task to be assigned.
在计算机集群的实际运行过程中,计算机集群的使用率是动态变化的。可能在某段时间计算机集群的利用率不高,而在其他时间因任务激增会出现资源紧张状态。在计算机集群利用不高时,相关技术中所有服务器均维持开启状态会造成一定的资源浪费。During the actual operation of the computer cluster, the utilization rate of the computer cluster changes dynamically. There may be times when the computer cluster is underutilized, and at other times, resources are under strain due to the surge in tasks. When the utilization of computer clusters is not high, all servers in the related art are kept in an open state, which will cause a certain waste of resources.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供一种服务器管理方法、装置、设备及存储介质,可以根据计算机集群的实际运行情况,动态地开启合适数量的服务器,实现整个计算机集群的功耗维持在与计算任务相适应的程度,避免资源浪费。Embodiments of the present invention provide a server management method, device, device, and storage medium, which can dynamically open an appropriate number of servers according to the actual operation of a computer cluster, so that the power consumption of the entire computer cluster can be maintained at a level suitable for computing tasks. to avoid wasting resources.
第一方面,本发明实施例提供了一种服务器管理方法,包括:In a first aspect, an embodiment of the present invention provides a server management method, including:
获取计算机集群的排队任务数量和空闲服务器数量;Get the number of queued tasks and the number of idle servers in a computer cluster;
判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;Determine whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers;
如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, obtain a target available server from the list of bootable servers of the computer cluster, and control the target idle server to perform booting operate;
其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.
第二方面,本发明实施例还提供了一种服务器管理装置,包括:In a second aspect, an embodiment of the present invention further provides a server management apparatus, including:
数量获取模块,用于获取计算机集群的排队任务数量和空闲服务器数量;The quantity acquisition module is used to acquire the number of queued tasks and the number of idle servers in the computer cluster;
数量判断模块,用于判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;A quantity judgment module, configured to judge whether the number of queued tasks is greater than a preset task quantity threshold, and whether the number of idle servers is less than the preset number of servers;
服务器开机模块,用于如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;A server power-on module, configured to obtain a target available server from the list of bootable servers of the computer cluster if the number of queued tasks is greater than the preset task number threshold and the number of idle servers is less than the preset number of servers, and control all the available servers. The target idle server performs the power-on operation;
其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.
第三方面,本发明实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如本发明实施例所述的服务器管理方法。In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the computer program as described herein when the processor executes the computer program. The server management method described in the embodiment of the invention is provided.
第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如本发明实施例所述的服务器管理方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the server management method according to the embodiment of the present invention is implemented .
本发明实施例的技术方案,通过获取计算机集群的排队任务数量和空闲服务器数量,然后判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。The technical solution of the embodiment of the present invention is to obtain the number of queued tasks and the number of idle servers in the computer cluster, and then determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; When the number of idle servers is greater than the preset number of tasks and the number of idle servers is less than the number of preset servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform the power-on operation. According to the number of queued tasks and the number of idle servers, Determine the tension of the current computing task application resources of the computer cluster. According to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and when the server needs to be automatically started, the appropriate number of servers can be dynamically started. In this way, it is possible to dynamically apply for resources according to the current computing task of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for computing tasks, and avoid resource waste.
附图说明Description of drawings
图1为本发明实施例一提供的一种服务器管理方法的流程图。FIG. 1 is a flowchart of a server management method according to Embodiment 1 of the present invention.
图2为本发明实施例二提供的一种服务器管理方法的流程图。FIG. 2 is a flowchart of a server management method according to Embodiment 2 of the present invention.
图3为本发明实施例三提供的一种服务器管理方法的流程图。FIG. 3 is a flowchart of a server management method according to Embodiment 3 of the present invention.
图4为本发明实施例四提供的一种服务器管理方法的流程图。FIG. 4 is a flowchart of a server management method according to Embodiment 4 of the present invention.
图5为本发明实施例五提供的一种服务器管理装置的结构示意图。FIG. 5 is a schematic structural diagram of a server management apparatus according to Embodiment 5 of the present invention.
图6为本发明实施例六提供的一种计算机设备的结构示意图。FIG. 6 is a schematic structural diagram of a computer device according to Embodiment 6 of the present invention.
具体实施方式Detailed ways
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。In addition, it should be noted that, for the convenience of description, the drawings only show some but not all of the contents related to the present invention. Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts various operations (or steps) as a sequential process, many of the operations may be performed in parallel, concurrently, or concurrently. Additionally, the order of operations can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, subroutines, and the like.
实施例一Example 1
图1为本发明实施例一提供的一种服务器管理方法的流程图。本发明实施例可适用于对计算机集群中的服务器进行管理的情况,该方法可以由本发明实施例提供的服务器管理装置来执行,该装置可采用软件和/或硬件的方式实现,并一般可集成在计算机设备中。例如,计算机集群中的管理服务器。管理服务器是用于对计算机集群中的全部服务器进行管理的服务器。如图1所示,本发明实施例的方法具体包括:FIG. 1 is a flowchart of a server management method according to Embodiment 1 of the present invention. The embodiments of the present invention may be applicable to the situation of managing servers in a computer cluster, and the method may be executed by the server management apparatus provided by the embodiments of the present invention, which may be implemented in software and/or hardware, and generally integrated in computer equipment. For example, a management server in a computer cluster. The management server is a server for managing all the servers in the computer cluster. As shown in FIG. 1, the method of the embodiment of the present invention specifically includes:
步骤101、获取计算机集群的排队任务数量和空闲服务器数量。Step 101: Obtain the number of queued tasks and the number of idle servers of the computer cluster.
本实施例中,计算机集群的排队任务数量是计算机集群内所有用户的排队任务数量。空闲服务器数量是计算机集群内所有空闲服务器的数量。空闲服务器是服务器内的全部计算资源处于空闲状态的服务器。示例性的,计算资源可以为图形处理器(GraphicsProcessing Unit,GPU)卡。In this embodiment, the number of queued tasks of the computer cluster is the number of queued tasks of all users in the computer cluster. The number of idle servers is the number of all idle servers in the computer cluster. An idle server is a server in which all computing resources within the server are in an idle state. Exemplarily, the computing resource may be a graphics processing unit (Graphics Processing Unit, GPU) card.
可选的,所述获取计算机集群的排队任务数量和空闲服务器数量,可以包括:按照预设开机时间间隔,定时获取计算机集群的排队任务数量和空闲服务器数量。Optionally, the acquiring the number of queued tasks and the number of idle servers of the computer cluster may include: periodically acquiring the number of queued tasks and the number of idle servers of the computer cluster according to a preset startup time interval.
预设开机时间间隔可以根据业务需求进行设置。示例性的,预设开机时间间隔可以为15分钟。每隔15分钟获取计算机集群的排队任务数量和空闲服务器数量;判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。由此,可以每隔预设开机时间间隔执行一次自动开机流程。The preset power-on time interval can be set according to business requirements. Exemplarily, the preset power-on time interval may be 15 minutes. Obtain the number of queued tasks and the number of idle servers of the computer cluster every 15 minutes; determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; if the number of queued tasks is greater than The preset number of tasks threshold, and the number of idle servers is less than the preset number of servers, obtain a target available server from the list of bootable servers of the computer cluster, and control the target idle server to perform a power-on operation; wherein, the bootable server The available servers in the list are the servers that were successfully shut down during the automatic shutdown process. Thus, the automatic power-on process can be executed every preset power-on time interval.
可选的,可以通过预设的用于获取排队任务数量和空闲服务器数量的脚本命令,获取计算机集群的排队任务数量和空闲服务器数量。Optionally, the number of queued tasks and the number of idle servers in the computer cluster can be acquired through a preset script command for acquiring the number of queued tasks and the number of idle servers.
步骤102、判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量:若是,则执行步骤103;若否,则执行步骤104。Step 102: Determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers: if yes, go to step 103; if not, go to step 104.
可选的,预设任务数量阈值的取值可以根据任务的频繁度进行确定。示例性的,预设任务数量阈值的取值为10。在实际的场景中,用户不合理地向服务器提交计算资源请求很容易导致任务排队。根据经验,排队任务数量在10及以下均可以认为正常。Optionally, the value of the preset task quantity threshold may be determined according to the frequency of tasks. Exemplarily, the value of the preset task quantity threshold is 10. In practical scenarios, users who submit computing resource requests to the server unreasonably can easily lead to task queuing. According to experience, the number of queued tasks is 10 and below can be considered normal.
判断排队任务数量是否大于预设任务数量阈值。如果排队任务数量大于预设任务数量阈值,表明排队任务数量大于正常值,当前计算任务申请资源比较紧张,需要自动开启服务器,保障计算任务的及时处理。如果排队任务数量小于等于预设任务数量阈值,表明排队任务数量小于等于正常值,当前计算任务申请资源不是很紧张,暂时不需要自动开启服务器,保障计算任务的及时处理。Determine whether the number of queued tasks is greater than the preset task number threshold. If the number of queued tasks is greater than the preset task number threshold, it indicates that the number of queued tasks is greater than the normal value, and the current computing task application resources are relatively tight, and the server needs to be automatically started to ensure the timely processing of computing tasks. If the number of queued tasks is less than or equal to the preset number of tasks threshold, it indicates that the number of queued tasks is less than or equal to the normal value, the current computing task application resources are not very tight, and the server does not need to be automatically started temporarily to ensure the timely processing of computing tasks.
可选的,预设服务器数量的取值可以根据可用机器数量来确定。示例性的,预设服务器数量的取值为2。在实际的场景中,一台服务器经常配置有8张GPU卡,当空闲服务器仅有1台而用户提交需要16张GPU卡的计算任务时,则该任务将处于排队状态。为此将预设服务器数量设置为2,则有助于及时对服务器进行开机,为该计算任务提供资源支持。Optionally, the value of the preset number of servers may be determined according to the number of available machines. Exemplarily, the value of the preset number of servers is 2. In actual scenarios, a server is often configured with 8 GPU cards. When there is only one idle server and a user submits a computing task that requires 16 GPU cards, the task will be queued. To this end, setting the preset number of servers to 2 helps to start the servers in time to provide resource support for the computing task.
判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量。如果排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量,表明排队任务数量大于正常值,可用的空闲服务器数量小于正常值,当前计算任务申请资源比较紧张,需要自动开启服务器,保障计算任务的及时处理。如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,表明当前计算任务申请资源不是很紧张,暂时不需要自动开启服务器,保障计算任务的及时处理。Determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers. If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, it indicates that the number of queued tasks is greater than the normal value, the number of available idle servers is less than the normal value, the current computing task application resources are relatively tight, and the server needs to be automatically started , to ensure the timely processing of computing tasks. If the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, it indicates that the current computing task application resources are not very tight, and the server does not need to be automatically started temporarily to ensure the timely processing of computing tasks.
同时根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以更合理地判断是否需要自动开启服务器。At the same time, according to the number of queued tasks and the number of idle servers, it is determined whether the current computing task application resources of the computer cluster are tense, and it can be more reasonably judged whether it is necessary to automatically start the server.
可选的,可以还包括:判断所述计算机集群的可开机服务器列表中是否存在可用服务器。Optionally, it may further include: judging whether there is an available server in the bootable server list of the computer cluster.
本实施例中,可以在判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量之前,判断所述计算机集群的可开机服务器列表中是否存在可用服务器。可选的,还可以在判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量之后,判断所述计算机集群的可开机服务器列表中是否存在可用服务器。In this embodiment, before judging whether the number of queued tasks is greater than the preset task number threshold and whether the number of idle servers is less than the preset number of servers, it can be judged whether there is an available server in the list of bootable servers of the computer cluster . Optionally, after judging whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers, it is also possible to judge whether there is an available server in the list of bootable servers of the computer cluster. .
计算机集群的可开机服务器列表内维护计算机集群内可开机的可用服务器。可用服务器即为可开机的服务器。可选的,可开机服务器列表内的可用服务器可以为在自动关机流程中被成功关机的服务器。The bootable server list of the computer cluster maintains the available servers that can be powered on in the computer cluster. An available server is a server that can be powered on. Optionally, the available servers in the bootable server list may be servers that are successfully shut down in the automatic shutdown process.
如果确定计算机集群的可开机服务器列表中不存在可用服务器,则意味着即便后续流程中需要自动开启服务器也无法达到目的,所以这种情况可以结束流程。如果确定计算机集群的可开机服务器列表中存在可用服务器,则意味着即便后续流程有成功开启服务器的可能,所以可以继续执行后续步骤。If it is determined that there is no available server in the list of bootable servers of the computer cluster, it means that even if the server needs to be automatically powered on in the subsequent process, the purpose cannot be achieved, and in this case, the process can be ended. If it is determined that there are available servers in the bootable server list of the computer cluster, it means that even if the subsequent process has the possibility of successfully booting the server, the subsequent steps can be continued.
步骤103、从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作。Step 103: Acquire a target available server from a list of bootable servers of the computer cluster, and control the target idle server to perform a booting operation.
其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The available servers in the bootable server list are servers that are successfully shut down in the automatic shutdown process.
可选的,自动关机流程可以为:获取所述计算机集群的空闲服务器数量;根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。Optionally, the automatic shutdown process may be: acquiring the number of idle servers in the computer cluster; judging whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers; if the computer cluster satisfies the idle server shutdown condition, Then, the target idle server is acquired from the idle servers of the computer cluster, the target idle server is controlled to perform a shutdown operation, and the target idle server that has been successfully shut down is added to the bootable server list of the computer cluster.
如果排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量,则从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作。If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform a boot operation.
可选的,从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作,可以包括:从计算机集群的可开机服务器列表中选择一台可用服务器作为目标可用服务器;通过智能平台管理接口(Intelligent PlatformManagement Interface,IPMI)指令,控制目标空闲服务器执行开机操作。Optionally, obtaining a target available server from a list of bootable servers of the computer cluster, and controlling the target idle server to perform a booting operation may include: selecting an available server from the list of bootable servers of the computer cluster as the target available server. Server; control the target idle server to perform a boot operation through an intelligent platform management interface (Intelligent Platform Management Interface, IPMI) instruction.
可选的,在控制所述目标空闲服务器执行开机操作之后,可以还包括:在等待预设开机时间段后,判断所述目标可用服务器是否开机成功;如果所述目标可用服务器开机成功,则对所述目标可用服务器进行初始化操作;在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功;如果所述目标可用服务器初始化成功,则将所述目标可用服务器的正常上线信息写入至日志文件中。Optionally, after controlling the target idle server to perform a power-on operation, it may further include: after waiting for a preset power-on time period, judging whether the target available server is successfully powered on; The target available server performs an initialization operation; after waiting for a preset initialization time period, it is determined whether the target available server is successfully initialized; if the target available server is successfully initialized, the normal online information of the target available server is written into to the log file.
预设开机时间段可以根据服务器开机操作需要的时间来确定。示例性的,通常服务器开机操作需要5分钟,预设开机时间段的取值为5分钟。预设初始化时间段可以根据服务器初始化操作需要的时间来确定。正常上线信息是用于记录目标可用服务器在当前自动开机流程中成功完成开机操作和初始化操作,正常上线的信息。The preset startup time period may be determined according to the time required for the server startup operation. Exemplarily, generally, it takes 5 minutes to start the server, and the value of the preset start-up time period is 5 minutes. The preset initialization time period may be determined according to the time required for the server initialization operation. The normal online information is used to record the information that the target available server successfully completes the startup operation and initialization operation in the current automatic startup process and goes online normally.
可选的,通过网络诊断工具(Packet Internet Groper,PING)指令对目标可用服务器进行测试,判断目标可用服务器是否开机成功。Optionally, a network diagnostic tool (Packet Internet Groper, PING) command is used to test the target available server to determine whether the target available server is successfully powered on.
可选的,如果目标可用服务器没有开机成功,则将目标可用服务器的开机未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的开机未成功信息,手动对目标可用服务器进行干预和维护。开机未成功信息是用于记录目标可用服务器在当前自动开机流程中没有开机成功的信息。Optionally, if the target available server has not been successfully powered on, the information about the unsuccessful booting of the target available server is written into the log file, so that the operation and maintenance personnel can check the log file regularly and determine whether the target available server has failed to boot according to the failure of the target server. information, manually intervene and maintain the target available servers. The boot failure information is used to record the information that the target available server fails to boot successfully in the current automatic boot process.
可选的,初始化操作可以包括内存交换分区(SWaP)检查,资源管理系统SLURM配置文件同步,显卡初始化,存储挂载情况检查,调度系统服务检查是否正常等操作。Optionally, the initialization operation may include memory swap partition (SWaP) checking, resource management system SLURM configuration file synchronization, graphics card initialization, storage mounting status checking, and scheduling system services checking whether they are normal or not.
可选的,如果目标可用服务器没有初始化成功,则将目标可用服务器的初始化未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的初始化未成功信息,手动对目标可用服务器进行干预和维护。初始化未成功信息是用于记录目标可用服务器在当前自动开机流程中没有初始化成功的信息。Optionally, if the target available server has not been initialized successfully, write the information about the unsuccessful initialization of the target available server into the log file, so that the operation and maintenance personnel can check the log file regularly and check if the initialization of the target available server is unsuccessful. information, manually intervene and maintain the target available servers. The initialization failure information is used to record the information that the target available server has not been successfully initialized in the current automatic startup process.
步骤104、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 104: Write the current resource situation information of the computer cluster into a log file.
本实施例中,如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,则将计算机集群的当前资源情况信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据计算机集群的当前资源情况信息确定计算机集群在当前自动开机流程中的资源情况。In this embodiment, if the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, the current resource situation information of the computer cluster is written into the log file, so that the operation and maintenance personnel can When checking the log file regularly, the resource situation of the computer cluster in the current automatic startup process is determined according to the current resource situation information of the computer cluster.
可选的,计算机集群的当前资源情况信息包括计算机集群的排队任务数量和空闲服务器数量。Optionally, the current resource situation information of the computer cluster includes the number of queued tasks and the number of idle servers of the computer cluster.
本发明实施例提供了一种服务器管理方法,通过获取计算机集群的排队任务数量和空闲服务器数量,然后判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。The embodiment of the present invention provides a server management method, by acquiring the number of queued tasks and the number of idle servers in a computer cluster, and then judging whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers; and When the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform the power-on operation. The number of idle servers is used to determine the tense situation of the current computing task application resources of the computer cluster. According to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and the server needs to be automatically started. The number of servers, so as to dynamically apply for resources according to the current computing tasks of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for the computing tasks, and avoid resources waste.
实施例二Embodiment 2
图2为本发明实施例二提供的一种服务器管理方法的流程图。本发明实施例可以与上述一个或者多个实施例中各个可选方案结合,在本发明实施例中,服务器管理方法可以还包括:判断所述计算机集群的可开机服务器列表中是否存在可用服务器。以及,在控制所述目标空闲服务器执行开机操作之后,可以还包括:在等待预设开机时间段后,判断所述目标可用服务器是否开机成功;如果所述目标可用服务器开机成功,则对所述目标可用服务器进行初始化操作;在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功;如果所述目标可用服务器初始化成功,则将所述目标可用服务器的正常上线信息写入至日志文件中。FIG. 2 is a flowchart of a server management method according to Embodiment 2 of the present invention. This embodiment of the present invention may be combined with each optional solution in one or more of the foregoing embodiments. In this embodiment of the present invention, the server management method may further include: judging whether an available server exists in the bootable server list of the computer cluster. And, after controlling the target idle server to perform a power-on operation, the method may further include: after waiting for a preset power-on time period, judging whether the target available server is successfully powered on; The target available server is initialized; after waiting for the preset initialization time period, it is determined whether the target available server is successfully initialized; if the target available server is successfully initialized, the normal online information of the target available server is written to the log in the file.
如图2所示,本发明实施例的方法具体包括:As shown in FIG. 2, the method of the embodiment of the present invention specifically includes:
步骤201、获取计算机集群的排队任务数量和空闲服务器数量。Step 201: Obtain the number of queued tasks and the number of idle servers of the computer cluster.
本实施例中未详尽的描述可以参考前述实施例。For details not described in this embodiment, reference may be made to the foregoing embodiments.
步骤202、判断所述计算机集群的可开机服务器列表中是否存在可用服务器:若是,则执行步骤203;若否,则结束流程。
本实施例中,计算机集群的可开机服务器列表内维护计算机集群内可开机的可用服务器。可用服务器即为可开机的服务器。可选的,可开机服务器列表内的可用服务器可以为在自动关机流程中被成功关机的服务器。In this embodiment, the bootable servers in the computer cluster are maintained in the bootable server list of the computer cluster. An available server is a server that can be powered on. Optionally, the available servers in the bootable server list may be servers that are successfully shut down in the automatic shutdown process.
如果确定计算机集群的可开机服务器列表中不存在可用服务器,则意味着即便后续流程中需要自动开启服务器也无法达到目的,所以这种情况可以结束流程。如果确定计算机集群的可开机服务器列表中存在可用服务器,则意味着即便后续流程有成功开启服务器的可能,所以可以继续执行后续步骤。If it is determined that there is no available server in the list of bootable servers of the computer cluster, it means that even if the server needs to be automatically powered on in the subsequent process, the purpose cannot be achieved, and in this case, the process can be ended. If it is determined that there are available servers in the bootable server list of the computer cluster, it means that even if the subsequent process has the possibility of successfully booting the server, the subsequent steps can be continued.
步骤203、判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量:若是,则执行步骤204;若否,则执行步骤211。Step 203: Determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers: if yes, go to step 204; if not, go to step 211.
本实施例中,判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量。如果排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量,表明排队任务数量大于正常值,可用的空闲服务器数量小于正常值,当前计算任务申请资源比较紧张,需要自动开启服务器,保障计算任务的及时处理。如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,表明当前计算任务申请资源不是很紧张,暂时不需要自动开启服务器,保障计算任务的及时处理。In this embodiment, it is determined whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers. If the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, it indicates that the number of queued tasks is greater than the normal value, the number of available idle servers is less than the normal value, the current computing task application resources are relatively tight, and the server needs to be automatically started , to ensure the timely processing of computing tasks. If the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, it indicates that the current computing task application resources are not very tight, and the server does not need to be automatically started temporarily to ensure the timely processing of computing tasks.
同时根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以更合理地判断是否需要自动开启服务器。At the same time, according to the number of queued tasks and the number of idle servers, it is determined whether the current computing task application resources of the computer cluster are tense, and it can be more reasonably judged whether it is necessary to automatically start the server.
步骤204、从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作。Step 204: Acquire a target available server from a list of bootable servers of the computer cluster, and control the target idle server to perform a booting operation.
可选的,从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作,可以包括:从计算机集群的可开机服务器列表中选择一台可用服务器作为目标可用服务器;通过IPMI指令,控制目标空闲服务器执行开机操作。Optionally, obtaining a target available server from a list of bootable servers of the computer cluster, and controlling the target idle server to perform a booting operation may include: selecting an available server from the list of bootable servers of the computer cluster as the target available server. Server; through the IPMI command, control the target idle server to perform the power-on operation.
步骤205、在等待预设开机时间段后,判断所述目标可用服务器是否开机成功:若是,则执行步骤206;若否,则执行步骤210。
本实施例中,预设开机时间段可以根据服务器开机操作需要的时间来确定。示例性的,通常服务器开机操作需要5分钟,预设开机时间段的取值为5分钟。In this embodiment, the preset startup time period may be determined according to the time required for the server startup operation. Exemplarily, generally, it takes 5 minutes to start the server, and the value of the preset start-up time period is 5 minutes.
可选的,通过PING指令对目标可用服务器进行测试,判断目标可用服务器是否开机成功。Optionally, use the PING command to test the target available server to determine whether the target available server is successfully powered on.
步骤206、对所述目标可用服务器进行初始化操作。
可选的,初始化操作可以包括内存交换分区(SWaP)检查,资源管理系统SLURM配置文件同步,显卡初始化,存储挂载情况检查,调度系统服务检查是否正常等操作。Optionally, the initialization operation may include memory swap partition (SWaP) checking, resource management system SLURM configuration file synchronization, graphics card initialization, storage mounting status checking, and scheduling system services checking whether they are normal or not.
步骤207、在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功:若是,则执行步骤208;若否,则执行步骤209。
本实施例中,预设初始化时间段可以根据服务器初始化操作需要的时间来确定。In this embodiment, the preset initialization time period may be determined according to the time required for the server initialization operation.
步骤208、将所述目标可用服务器的正常上线信息写入至日志文件中。Step 208: Write the normal online information of the target available server into a log file.
本实施例中,正常上线信息是用于记录目标可用服务器在当前自动开机流程中成功完成开机操作和初始化操作,正常上线的信息。In this embodiment, the normal online information is used to record the information that the target available server successfully completes the booting operation and the initialization operation in the current automatic booting process, and goes online normally.
步骤209、将所述目标可用服务器的初始化未成功信息写入至日志文件中。Step 209: Write the information about the unsuccessful initialization of the target available server into a log file.
本实施例中,如果目标可用服务器没有初始化成功,则将目标可用服务器的初始化未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的初始化未成功信息,手动对目标可用服务器进行干预和维护。初始化未成功信息是用于记录目标可用服务器在当前自动开机流程中没有初始化成功的信息。In this embodiment, if the target available server is not successfully initialized, the information about the unsuccessful initialization of the target available server is written into the log file, so that the operation and maintenance personnel can check the log file regularly, according to the initialization failure of the target available server. Success message, manually intervene and maintain the target available server. The initialization failure information is used to record the information that the target available server has not been successfully initialized in the current automatic startup process.
步骤210、将所述目标可用服务器的开机未成功信息写入至日志文件中。Step 210: Write the information about the unsuccessful startup of the target available server into a log file.
本实施例中,如果目标可用服务器没有开机成功,则将目标可用服务器的开机未成功信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据目标可用服务器的开机未成功信息,手动对目标可用服务器进行干预和维护。开机未成功信息是用于记录目标可用服务器在当前自动开机流程中没有开机成功的信息。In this embodiment, if the target available server has not been successfully powered on, the information about the unsuccessful booting of the target available server is written into the log file, so that the operation and maintenance personnel can check the log file regularly, according to the boot failure of the target available server. Success message, manually intervene and maintain the target available server. The boot failure information is used to record the information that the target available server fails to boot successfully in the current automatic boot process.
步骤211、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 211: Write the current resource situation information of the computer cluster into a log file.
本实施例中,如果排队任务数量小于等于预设任务数量阈值,或者空闲服务器数量大于等于预设服务器数量,则将计算机集群的当前资源情况信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据计算机集群的当前资源情况信息确定计算机集群在当前自动开机流程中的资源情况。In this embodiment, if the number of queued tasks is less than or equal to the preset task number threshold, or the number of idle servers is greater than or equal to the preset number of servers, the current resource situation information of the computer cluster is written into the log file, so that the operation and maintenance personnel can When checking the log file regularly, the resource situation of the computer cluster in the current automatic startup process is determined according to the current resource situation information of the computer cluster.
本发明实施例提供了一种服务器管理方法,通过获取计算机集群的排队任务数量和空闲服务器数量,然后在确定计算机集群的可开机服务器列表中存在可用服务器时,判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量,并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,则从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,根据目标可用服务器的开机情况和初始化情况,写入相应的信息至日志文件中,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费,可以根据服务器的开机情况和初始化情况,写入相应的信息至日志文件中,以使运维人员可以在定期查看日志文件时,根据日志文件中的信息,手动对目标可用服务器进行干预和维护。The embodiment of the present invention provides a server management method, by acquiring the number of queued tasks and the number of idle servers in a computer cluster, and then determining whether the number of queued tasks is greater than a preset task when it is determined that there are available servers in the list of bootable servers of the computer cluster Quantity threshold, and whether the number of idle servers is less than the preset number of servers, and when the number of queued tasks is greater than the preset number of tasks threshold, and the number of idle servers is less than the preset number of servers, the target is available from the list of bootable servers in the computer cluster. The server controls the target idle server to perform the power-on operation, and writes the corresponding information to the log file according to the start-up and initialization status of the target available server, and can determine the current computing task application resources of the computer cluster according to the number of queued tasks and the number of idle servers. In the tense situation, according to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and when the server needs to be automatically started, an appropriate number of servers can be dynamically opened, so as to dynamically In case of resource shortage for computing tasks, automatically start the server to ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for computing tasks, and avoid wasting resources. According to the startup status and initialization status of the server, write Enter the corresponding information into the log file, so that the operation and maintenance personnel can manually intervene and maintain the target available server according to the information in the log file when viewing the log file regularly.
实施例三Embodiment 3
图3为本发明实施例三提供的一种服务器管理方法的流程图。本发明实施例可以与上述一个或者多个实施例中各个可选方案结合,在本发明实施例中,服务器管理方法可以还包括:获取所述计算机集群的空闲服务器数量;根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。FIG. 3 is a flowchart of a server management method according to Embodiment 3 of the present invention. This embodiment of the present invention may be combined with each of the optional solutions in one or more of the foregoing embodiments. In this embodiment of the present invention, the server management method may further include: acquiring the number of idle servers in the computer cluster; according to the number of idle servers , determine whether the computer cluster satisfies the idle server shutdown condition; if the computer cluster satisfies the idle server shutdown condition, obtain the target idle server from the idle servers of the computer cluster, control the target idle server to perform the shutdown operation, and The target idle server that has been successfully shut down is added to the list of bootable servers of the computer cluster.
如图3所示,本发明实施例的方法具体包括:As shown in FIG. 3 , the method of the embodiment of the present invention specifically includes:
步骤301、获取计算机集群的空闲服务器数量。Step 301: Obtain the number of idle servers in the computer cluster.
本实施例中,空闲服务器数量是计算机集群内所有空闲服务器的数量。空闲服务器是服务器内的全部计算资源处于空闲状态的服务器。示例性的,计算资源可以为GPU卡。In this embodiment, the number of idle servers is the number of all idle servers in the computer cluster. An idle server is a server in which all computing resources within the server are in an idle state. Exemplarily, the computing resource may be a GPU card.
可选的,所述获取计算机集群的空闲服务器数量,可以包括:按照预设关闭时间间隔,定时获取计算机集群的空闲服务器数量。Optionally, the acquiring the number of idle servers in the computer cluster may include: periodically acquiring the number of idle servers in the computer cluster according to a preset shutdown time interval.
预设关闭时间间隔可以根据业务需求进行设置。示例性的,预设关闭时间间隔可以为一天。每隔一天获取计算机集群的空闲服务器数量;根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。由此,可以每隔预设关闭时间间隔执行一次自动关机流程。The preset shutdown time interval can be set according to business needs. Exemplarily, the preset turn-off time interval may be one day. Obtain the number of idle servers in the computer cluster every other day; according to the number of idle servers, determine whether the computer cluster satisfies the idle server shutdown condition; if the computer cluster meets the idle server shutdown condition, then select the idle server from the computer cluster The target idle server is obtained from the , the target idle server is controlled to perform a shutdown operation, and the target idle server that has been successfully shut down is added to the list of bootable servers of the computer cluster. Thus, the automatic shutdown process can be performed every preset shutdown time interval.
可选的,因为不宜频繁执行关机流程,预设关闭时间间隔长于前文所述的预设开机时间间隔。Optionally, because the shutdown process should not be performed frequently, the preset shutdown time interval is longer than the preset startup time interval described above.
根据经验,每天中凌晨附近时间段内用户提交的计算任务通常最少,计算机集群的利用率最低,此时触发自动关机流程往往最合适。所以可以按照预设关闭时间间隔,在每天凌晨获取计算机集群的空闲服务器数量。具体的,可以通过判断系统当前时间是否跨天来作为自动关机流程执行的触发条件,实现按照预设关闭时间间隔,在每天凌晨执行一次自动关机流程。According to experience, the computing tasks submitted by users are usually the least and the utilization rate of the computer cluster is the lowest in the time period around the middle and early morning of each day. At this time, it is often the most appropriate to trigger the automatic shutdown process. Therefore, the number of idle servers in the computer cluster can be obtained in the early morning of each day according to the preset shutdown time interval. Specifically, by judging whether the current time of the system spans days as a trigger condition for the execution of the automatic shutdown process, the automatic shutdown process can be executed once every morning in the early morning according to the preset shutdown time interval.
可选的,可以通过预设的用于获取空闲服务器数量的脚本命令,获取计算机集群的空闲服务器数量。Optionally, the number of idle servers in the computer cluster may be acquired through a preset script command for acquiring the number of idle servers.
步骤302、根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件:若是,则执行步骤303;若否,则执行步骤304。
可选的,所述根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件,可以包括:判断所述空闲服务器数量是否大于预设空闲服务器数量阈值;若是,则确定所述计算机集群满足空闲服务器关闭条件;若否,则确定所述计算机集群不满足空闲服务器关闭条件。Optionally, judging whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers may include: judging whether the number of idle servers is greater than a preset number of idle servers threshold; if so, determining the computer cluster The cluster satisfies the idle server shutdown condition; if not, it is determined that the computer cluster does not meet the idle server shutdown condition.
预设空闲服务器数量阈值的取值可以根据业务需求进行设置。示例性的,预设空闲服务器数量阈值的取值为5。The value of the preset number of idle servers threshold can be set according to business requirements. Exemplarily, the preset threshold for the number of idle servers is 5.
判断空闲服务器数量是否大于预设空闲服务器数量阈值。如果空闲服务器数量大于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量大于正常值,计算机集群中存在过多的空闲服务器维持开启状态,需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群满足空闲服务器关闭条件。如果空闲服务器数量小于等于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量小于等于正常值,计算机集群中不存在过多的空闲服务器维持开启状态,暂时不需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群不满足空闲服务器关闭条件。Determine whether the number of idle servers is greater than the preset number of idle servers threshold. If the number of idle servers is greater than the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is greater than the normal value, and there are too many idle servers in the computer cluster that remain open. Then it is determined that the computer cluster satisfies the idle server shutdown condition. If the number of idle servers is less than or equal to the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is less than or equal to the normal value, and there are not too many idle servers in the computer cluster to remain open, and there is no need to shut down too many idle servers temporarily. , to avoid resource waste, it is determined that the computer cluster does not meet the idle server shutdown condition.
步骤303、从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。Step 303: Acquire a target idle server from idle servers of the computer cluster, control the target idle server to perform a shutdown operation, and add the target idle server that has been successfully shut down to a list of bootable servers of the computer cluster.
可选的,所述从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,可以包括:计算所述空闲服务器数量与所述预设空闲服务器数量阈值的差值;从所述计算机集群的空闲服务器中获取所述差值数量的空闲服务器,作为目标空闲服务器,将所述目标空闲服务器从资源池中剔除;对剔除后的所述目标空闲服务器执行关机操作。Optionally, obtaining a target idle server from idle servers in the computer cluster, and controlling the target idle server to perform a shutdown operation, may include: calculating the difference between the number of idle servers and the preset threshold for the number of idle servers. obtain the idle servers with the difference in the number of idle servers from the idle servers of the computer cluster as target idle servers, and remove the target idle servers from the resource pool; perform a shutdown operation on the removed target idle servers .
资源池中包括计算机集群中维持在开启状态的服务器。本实施例中,将需要执行关机操作的目标空闲服务器及时从资源池中剔除。A resource pool includes servers in a computer cluster that are maintained in an on state. In this embodiment, the target idle server that needs to perform the shutdown operation is removed from the resource pool in time.
可选的,通过对剔除后的目标空闲服务器执行关机函数,完成对剔除后的目标空闲服务器的关机操作。Optionally, by executing a shutdown function on the eliminated target idle server, the shutdown operation of the eliminated target idle server is completed.
在一个具体实例中,预设空闲服务器数量阈值的取值为5。空闲服务器数量为7。空闲服务器数量大于5,表明计算机集群中的空闲服务器的数量大于正常值,计算机集群中存在过多的空闲服务器维持开启状态,需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群满足空闲服务器关闭条件。空闲服务器数量与预设空闲服务器数量阈值的差值为2。从计算机集群的空闲服务器中获取2台空闲服务器,作为目标空闲服务器。将目标空闲服务器从资源池中剔除。对剔除后的目标空闲服务器执行关机操作。In a specific example, the value of the preset threshold for the number of idle servers is 5. The number of idle servers is 7. If the number of idle servers is greater than 5, it indicates that the number of idle servers in the computer cluster is greater than the normal value, and there are too many idle servers in the computer cluster to maintain the open state. It is necessary to close too many idle servers to avoid wasting resources, and then determine that the computer cluster satisfies the idle state. Server shutdown condition. The difference between the number of idle servers and the preset number of idle servers is 2. Obtain 2 idle servers from idle servers in the computer cluster as target idle servers. Remove the target idle server from the resource pool. Perform a shutdown operation on the target idle server after culling.
计算机集群的可开机服务器列表内维护计算机集群内可开机的可用服务器。可用服务器即为可开机的服务器。关机成功的目标空闲服务器为计算机集群内可开机的可用服务器。由此,将在自动关机流程中被成功关机的目标空闲服务器添加至计算机集群的可开机服务器列表中。The bootable server list of the computer cluster maintains the available servers that can be powered on in the computer cluster. An available server is a server that can be powered on. The target idle server that is successfully shut down is an available server that can be powered on in the computer cluster. Thus, the target idle server that is successfully shut down in the automatic shutdown process is added to the list of bootable servers of the computer cluster.
步骤304、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 304: Write the current resource situation information of the computer cluster into a log file.
本实施例中,如果计算机集群不满足空闲服务器关闭条件,则将计算机集群的当前资源情况信息写入至日志文件中,以使运维人员可以在定期查看日志文件时,根据计算机集群的当前资源情况信息确定计算机集群在当前自动关机流程中的资源情况。In this embodiment, if the computer cluster does not meet the idle server shutdown condition, the current resource situation information of the computer cluster is written into the log file, so that the operation and maintenance personnel can check the log file regularly, according to the current resources of the computer cluster. The situation information determines the resource situation of the computer cluster in the current automatic shutdown process.
可选的,计算机集群的当前资源情况信息包括计算机集群的空闲服务器数量。Optionally, the current resource situation information of the computer cluster includes the number of idle servers in the computer cluster.
本发明实施例提供了一种服务器管理方法,通过获取计算机集群的空闲服务器数量,然后根据空闲服务器数量,判断计算机集群是否满足空闲服务器关闭条件,在计算机集群满足空闲服务器关闭条件时,从计算机集群的空闲服务器中获取目标空闲服务器,控制目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至计算机集群的可开机服务器列表中,可以根据空闲服务器数量,动态地关闭过多空闲的服务器,可以实现动态地根据计算机集群内服务器的空闲情况,自动关闭服务器,节省计算机集群的功耗,实现整个计算机集群的功耗维持在与计算任务相适应的程度,避免资源浪费。An embodiment of the present invention provides a server management method, by acquiring the number of idle servers in a computer cluster, and then judging whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers, and when the computer cluster meets the idle server shutdown condition, from the computer cluster Obtain the target idle server from the idle server, control the target idle server to perform the shutdown operation, and add the target idle server that has been successfully shut down to the list of bootable servers of the computer cluster, and can dynamically shut down too many idle servers according to the number of idle servers. , which can automatically shut down the server dynamically according to the idle condition of the server in the computer cluster, save the power consumption of the computer cluster, maintain the power consumption of the entire computer cluster at a level suitable for the computing task, and avoid the waste of resources.
实施例四Embodiment 4
图4为本发明实施例四提供的一种服务器管理方法的流程图。本发明实施例可以与上述一个或者多个实施例中各个可选方案结合,在本发明实施例中,所述根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件,可以包括:判断所述空闲服务器数量是否大于预设空闲服务器数量阈值;若是,则确定所述计算机集群满足空闲服务器关闭条件;若否,则确定所述计算机集群不满足空闲服务器关闭条件。FIG. 4 is a flowchart of a server management method according to Embodiment 4 of the present invention. This embodiment of the present invention may be combined with each of the optional solutions in one or more of the foregoing embodiments. In this embodiment of the present invention, determining whether the computer cluster satisfies the idle server shutdown condition according to the number of idle servers may include: : determine whether the number of idle servers is greater than the preset number of idle servers threshold; if so, determine that the computer cluster satisfies the idle server shutdown condition; if not, determine that the computer cluster does not meet the idle server shutdown condition.
以及,所述从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,可以包括:计算所述空闲服务器数量与所述预设空闲服务器数量阈值的差值;从所述计算机集群的空闲服务器中获取所述差值数量的空闲服务器,作为目标空闲服务器,将所述目标空闲服务器从资源池中剔除;对剔除后的所述目标空闲服务器执行关机操作。And, obtaining a target idle server from idle servers in the computer cluster, and controlling the target idle server to perform a shutdown operation, may include: calculating a difference between the number of idle servers and the preset threshold for the number of idle servers; The idle servers with the difference in the number of idle servers are obtained from idle servers in the computer cluster as target idle servers, and the target idle servers are eliminated from the resource pool; a shutdown operation is performed on the eliminated target idle servers.
如图4所示,本发明实施例的方法具体包括:As shown in FIG. 4 , the method of the embodiment of the present invention specifically includes:
步骤401、获取计算机集群的空闲服务器数量。Step 401: Obtain the number of idle servers in the computer cluster.
本实施例中未详尽的描述可以参考前述实施例。For details not described in this embodiment, reference may be made to the foregoing embodiments.
步骤402、判断所述空闲服务器数量是否大于预设空闲服务器数量阈值:若是,则执行步骤403;若否,则执行步骤406。
本实施例中,预设空闲服务器数量阈值的取值可以根据业务需求进行设置。示例性的,预设空闲服务器数量阈值的取值为5。In this embodiment, the value of the preset threshold for the number of idle servers may be set according to business requirements. Exemplarily, the preset threshold for the number of idle servers is 5.
如果空闲服务器数量大于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量大于正常值,计算机集群中存在过多的空闲服务器维持开启状态,需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群满足空闲服务器关闭条件。如果空闲服务器数量小于等于预设空闲服务器数量阈值,表明计算机集群中的空闲服务器的数量小于等于正常值,计算机集群中不存在过多的空闲服务器维持开启状态,暂时不需要关闭过多空闲的服务器,避免资源浪费,则确定计算机集群不满足空闲服务器关闭条件。If the number of idle servers is greater than the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is greater than the normal value, and there are too many idle servers in the computer cluster that remain open. Then it is determined that the computer cluster satisfies the idle server shutdown condition. If the number of idle servers is less than or equal to the preset number of idle servers, it indicates that the number of idle servers in the computer cluster is less than or equal to the normal value, and there are not too many idle servers in the computer cluster to remain open, and there is no need to shut down too many idle servers temporarily. , to avoid resource waste, it is determined that the computer cluster does not meet the idle server shutdown condition.
步骤403、计算所述空闲服务器数量与所述预设空闲服务器数量阈值的差值。Step 403: Calculate the difference between the number of idle servers and the preset threshold for the number of idle servers.
步骤404、从所述计算机集群的空闲服务器中获取所述差值数量的空闲服务器,作为目标空闲服务器,将所述目标空闲服务器从资源池中剔除。Step 404: Obtain the idle servers of the difference number from idle servers of the computer cluster as target idle servers, and remove the target idle servers from the resource pool.
步骤405、对剔除后的所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。Step 405 : Perform a shutdown operation on the eliminated target idle server, and add the target idle server whose shutdown is successful to the list of bootable servers of the computer cluster.
可选的,通过对剔除后的目标空闲服务器执行关机函数,完成对剔除后的目标空闲服务器的关机操作。Optionally, by executing a shutdown function on the eliminated target idle server, the shutdown operation of the eliminated target idle server is completed.
步骤406、将所述计算机集群的当前资源情况信息写入至日志文件中。Step 406: Write the current resource situation information of the computer cluster into a log file.
本发明实施例提供了一种服务器管理方法,通过判断空闲服务器数量是否大于预设空闲服务器数量阈值,确定计算机集群是否满足空闲服务器关闭条件,在空闲服务器数量大于预设空闲服务器数量阈值时,确定计算机集群满足空闲服务器关闭条件,然后计算空闲服务器数量与预设空闲服务器数量阈值的差值,从计算机集群的空闲服务器中获取差值数量的空闲服务器,作为目标空闲服务器,将目标空闲服务器从资源池中剔除,对剔除后的目标空闲服务器执行关机操作,可以根据空闲服务器数量和预设空闲服务器数量阈值,动态地关闭过多空闲的服务器,可以实现动态地根据计算机集群内服务器的空闲情况,自动关闭服务器,节省计算机集群的功耗,实现整个计算机集群的功耗维持在与计算任务相适应的程度,避免资源浪费。An embodiment of the present invention provides a server management method, by judging whether the number of idle servers is greater than a preset number of idle servers threshold, to determine whether a computer cluster satisfies the idle server shutdown condition, and when the number of idle servers is greater than the preset number of idle servers Threshold, determine The computer cluster satisfies the idle server shutdown condition, and then calculates the difference between the number of idle servers and the preset number of idle servers, and obtains the idle servers with the difference from the idle servers of the computer cluster as the target idle server. Eliminate from the pool, perform the shutdown operation on the target idle server after the elimination, and dynamically shut down too many idle servers according to the number of idle servers and the preset number of idle servers. The server is automatically shut down, the power consumption of the computer cluster is saved, the power consumption of the entire computer cluster is maintained at a level suitable for the computing task, and the waste of resources is avoided.
实施例五Embodiment 5
图5为本发明实施例五提供的一种服务器管理装置的结构示意图。如图5所示,所述装置包括:数量获取模块501、数量判断模块502以及服务器开机模块503。FIG. 5 is a schematic structural diagram of a server management apparatus according to Embodiment 5 of the present invention. As shown in FIG. 5 , the apparatus includes: a
其中,数量获取模块501,用于获取计算机集群的排队任务数量和空闲服务器数量;数量判断模块502,用于判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;服务器开机模块503,用于如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。Wherein, the
本发明实施例提供了一种服务器管理装置,通过获取计算机集群的排队任务数量和空闲服务器数量,然后判断排队任务数量是否大于预设任务数量阈值,且空闲服务器数量是否小于预设服务器数量;并在排队任务数量大于预设任务数量阈值,且空闲服务器数量小于预设服务器数量时,从计算机集群的可开机服务器列表中获取目标可用服务器,控制目标空闲服务器执行开机操作,可以根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源的紧张情况,可以在根据排队任务数量和空闲服务器数量,确定计算机集群的当前计算任务申请资源比较紧张,需要自动开启服务器时,动态地开启合适数量的服务器,从而实现动态地根据计算机集群的当前计算任务申请资源的紧张情况,自动开启服务器,保障计算任务的及时处理,实现整个集群的功耗维持在与计算任务相适应的程度,避免资源浪费。An embodiment of the present invention provides a server management device, by acquiring the number of queued tasks and the number of idle servers in a computer cluster, and then determining whether the number of queued tasks is greater than a preset task number threshold, and whether the number of idle servers is less than the preset number of servers; and When the number of queued tasks is greater than the preset task number threshold, and the number of idle servers is less than the preset number of servers, the target available server is obtained from the list of bootable servers in the computer cluster, and the target idle server is controlled to perform the power-on operation. The number of idle servers is used to determine the tense situation of the current computing task application resources of the computer cluster. According to the number of queued tasks and the number of idle servers, it can be determined that the current computing task application resources of the computer cluster are relatively tight, and the server needs to be automatically started. The number of servers, so as to dynamically apply for resources according to the current computing tasks of the computer cluster, automatically start the server, ensure the timely processing of computing tasks, maintain the power consumption of the entire cluster at a level suitable for the computing tasks, and avoid resources waste.
在本发明实施例的一个可选实施方式中,可选的,服务器管理装置可以还包括:服务器判断模块,用于判断所述计算机集群的可开机服务器列表中是否存在可用服务器。In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: a server judgment module, configured to judge whether there is an available server in the bootable server list of the computer cluster.
在本发明实施例的一个可选实施方式中,可选的,数量获取模块501可以包括:数量定时获取单元,用于按照预设开机时间间隔,定时获取计算机集群的排队任务数量和空闲服务器数量。In an optional implementation of the embodiment of the present invention, optionally, the
在本发明实施例的一个可选实施方式中,可选的,服务器管理装置可以还包括:开机判断模块,用于在等待预设开机时间段后,判断所述目标可用服务器是否开机成功;服务器初始化模块,用于如果所述目标可用服务器开机成功,则对所述目标可用服务器进行初始化操作;初始化判断模块,用于在等待预设初始化时间段后,判断所述目标可用服务器是否初始化成功;信息写入模块,用于如果所述目标可用服务器初始化成功,则将所述目标可用服务器的正常上线信息写入至日志文件中。In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: a power-on judgment module, configured to judge whether the target available server is successfully powered on after waiting for a preset power-on time period; the server an initialization module, configured to perform an initialization operation on the target available server if the target available server is successfully powered on; an initialization judgment module, configured to determine whether the target available server is successfully initialized after waiting for a preset initialization time period; An information writing module, configured to write the normal online information of the target available server into a log file if the target available server is successfully initialized.
在本发明实施例的一个可选实施方式中,可选的,服务器管理装置可以还包括:空闲数量获取模块,用于获取所述计算机集群的空闲服务器数量;关闭条件判断模块,用于根据所述空闲服务器数量,判断所述计算机集群是否满足空闲服务器关闭条件;服务器关机模块,用于如果所述计算机集群满足空闲服务器关闭条件,则从所述计算机集群的空闲服务器中获取目标空闲服务器,控制所述目标空闲服务器执行关机操作,并将关机成功的目标空闲服务器添加至所述计算机集群的可开机服务器列表中。In an optional implementation manner of the embodiment of the present invention, optionally, the server management apparatus may further include: an idle quantity acquisition module, configured to acquire the idle server quantity of the computer cluster; a shutdown condition judgment module, configured to the number of idle servers, to determine whether the computer cluster satisfies the idle server shutdown condition; the server shutdown module is configured to obtain the target idle server from the idle servers of the computer cluster if the computer cluster meets the idle server shutdown condition, and control the The target idle server performs a shutdown operation, and the target idle server that is successfully shut down is added to the list of bootable servers of the computer cluster.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
上述服务器管理装置可执行本发明任意实施例所提供的服务器管理方法,具备执行服务器管理方法相应的功能模块和有益效果。The above-mentioned server management apparatus can execute the server management method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the server management method.
实施例六Embodiment 6
图6为本发明实施例六提供的一种计算机设备的结构示意图。图6示出了适于用来实现本发明实施方式的示例性计算机设备12的框图。图6显示的计算机设备12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。FIG. 6 is a schematic structural diagram of a computer device according to Embodiment 6 of the present invention. Figure 6 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
如图6所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器16,存储器28,连接不同系统组件(包括存储器28和处理器16)的总线18。As shown in FIG. 6, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to, one or
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. By way of example, these architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12, including both volatile and nonvolatile media, removable and non-removable media.
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图6未显示,通常称为“硬盘驱动器”)。尽管图6中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。A program/
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图6中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device,
处理器16通过运行存储在存储器28中的程序,从而执行各种功能应用以及数据处理,实现本发明实施例所提供的服务器管理方法:获取计算机集群的排队任务数量和空闲服务器数量;判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。The
实施例七Embodiment 7
本发明实施例七提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,实现本发明实施例所提供的服务器管理方法:获取计算机集群的排队任务数量和空闲服务器数量;判断所述排队任务数量是否大于预设任务数量阈值,且所述空闲服务器数量是否小于预设服务器数量;如果所述排队任务数量大于预设任务数量阈值,且所述空闲服务器数量小于预设服务器数量,则从所述计算机集群的可开机服务器列表中获取目标可用服务器,控制所述目标空闲服务器执行开机操作;其中,可开机服务器列表内的可用服务器为在自动关机流程中被成功关机的服务器。Embodiment 7 of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the server management method provided by the embodiment of the present invention: obtaining the number of queued tasks of a computer cluster and The number of idle servers; determine whether the number of queued tasks is greater than the preset task number threshold, and whether the number of idle servers is less than the preset number of servers; if the number of queued tasks is greater than the preset task number threshold, and the number of idle servers If the number of servers is less than the preset number, the target available server is obtained from the list of bootable servers of the computer cluster, and the target idle server is controlled to perform the booting operation; wherein, the available servers in the bootable server list are those that were automatically shut down in the automatic shutdown process. A server that was successfully shut down.
可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Any combination of one or more computer-readable media may be employed. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或计算机设备上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional Procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or computer device. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to connect).
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010760328.9A CN111930502A (en) | 2020-07-31 | 2020-07-31 | Server management method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010760328.9A CN111930502A (en) | 2020-07-31 | 2020-07-31 | Server management method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111930502A true CN111930502A (en) | 2020-11-13 |
Family
ID=73315098
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010760328.9A Pending CN111930502A (en) | 2020-07-31 | 2020-07-31 | Server management method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111930502A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668882A (en) * | 2020-12-29 | 2021-04-16 | 浙江科钛机器人股份有限公司 | Autonomous survival detection and distributed coordination method for mobile robot cluster |
CN114443297A (en) * | 2022-01-21 | 2022-05-06 | 北京金山云网络技术有限公司 | Computing task processing method, device, storage medium and electronic device |
CN119030970A (en) * | 2024-10-28 | 2024-11-26 | 成都掠食鸟科技有限公司 | A system for remote file transfer and storage via remote equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2292040A1 (en) * | 1999-03-25 | 2000-09-25 | International Business Machines Corporation | Interface system and method for asynchronously updating a shared resource |
JP2008047096A (en) * | 2006-08-14 | 2008-02-28 | Fuji Xerox Co Ltd | Computer system, method, and program for queuing |
CN103645956A (en) * | 2013-12-18 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Intelligent cluster load management method |
CN110764892A (en) * | 2019-10-22 | 2020-02-07 | 北京字节跳动网络技术有限公司 | Task processing method, device and computer readable storage medium |
-
2020
- 2020-07-31 CN CN202010760328.9A patent/CN111930502A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2292040A1 (en) * | 1999-03-25 | 2000-09-25 | International Business Machines Corporation | Interface system and method for asynchronously updating a shared resource |
JP2008047096A (en) * | 2006-08-14 | 2008-02-28 | Fuji Xerox Co Ltd | Computer system, method, and program for queuing |
CN103645956A (en) * | 2013-12-18 | 2014-03-19 | 浪潮电子信息产业股份有限公司 | Intelligent cluster load management method |
CN110764892A (en) * | 2019-10-22 | 2020-02-07 | 北京字节跳动网络技术有限公司 | Task processing method, device and computer readable storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112668882A (en) * | 2020-12-29 | 2021-04-16 | 浙江科钛机器人股份有限公司 | Autonomous survival detection and distributed coordination method for mobile robot cluster |
CN112668882B (en) * | 2020-12-29 | 2024-04-16 | 浙江科钛机器人股份有限公司 | Mobile robot cluster autonomous survival detection and distributed coordination method |
CN114443297A (en) * | 2022-01-21 | 2022-05-06 | 北京金山云网络技术有限公司 | Computing task processing method, device, storage medium and electronic device |
CN119030970A (en) * | 2024-10-28 | 2024-11-26 | 成都掠食鸟科技有限公司 | A system for remote file transfer and storage via remote equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101512252B1 (en) | Method of provisioning firmware in an operating system (os) absent services environment | |
US8296553B2 (en) | Method and system to enable fast platform restart | |
US9600294B2 (en) | Port throttling across an operating system restart during a hot upgrade | |
US7584374B2 (en) | Driver/variable cache and batch reading system and method for fast resume | |
US20120036383A1 (en) | Power supply for networked host computers and control method thereof | |
US20100079472A1 (en) | Method and systems to display platform graphics during operating system initialization | |
US10860363B1 (en) | Managing virtual machine hibernation state incompatibility with underlying host configurations | |
CN105765534A (en) | Virtual computing systems and methods | |
CN111930502A (en) | Server management method, device, equipment and storage medium | |
US8972964B2 (en) | Dynamic firmware updating system for use in translated computing environments | |
US10649832B2 (en) | Technologies for headless server manageability and autonomous logging | |
WO2025118803A1 (en) | Server operation starting method and device, server, and storage medium | |
CN110851384B (en) | Interrupt processing method, system and computer readable storage medium | |
US10996942B1 (en) | System and method for graphics processing unit firmware updates | |
US11516082B1 (en) | Configuration of a baseboard management controller (BMC) group leader responsive to load | |
US10394619B2 (en) | Signature-based service manager with dependency checking | |
US9430265B1 (en) | System and method for handling I/O timeout deadlines in virtualized systems | |
CN110502267A (en) | Update method, device, equipment and the storage medium of appliance applications | |
US8060605B1 (en) | Systems and methods for evaluating the performance of remote computing systems | |
CN111741130A (en) | Server management method, device, equipment and storage medium | |
US12367056B2 (en) | Reliable device assignment for virtual machine based containers | |
US10104619B2 (en) | Retrieval of a command from a management server | |
US20230359533A1 (en) | User Triggered Virtual Machine Cloning for Recovery/Availability/Scaling | |
CN115509590B (en) | Continuous deployment method and computer equipment | |
EP3326062B1 (en) | Mitigation of the impact of intermittent unavailability of remote storage on virtual machines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201113 |