Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The following explains key terms appearing in the present invention.
Kubernetes (K8 s for short) is an open-source container orchestration platform for automated deployment, expansion, and management of containerized applications. A Kubernetes cluster consists of two classes of nodes, Control Plane and working node (Worker Nodes) , working cooperatively to achieve high availability and resiliency of applications.
1. Control plane (Master node)
Function , the "brain" of the cluster, is responsible for global decisions and coordination.
Core component :
API SERVER , the entry of the cluster, receives user instructions and communicates with other components.
Etcd distributed key value store, save all configuration and status data of the cluster.
Controller Manager monitoring cluster status to ensure that the system is operating as intended (e.g., failover, copy number maintenance).
Scheduler to assign the newly created Pod to the appropriate Worker node.
2. Working node (Worker Nodes)
Function a container running a user application.
Core component :
kubelet communicating with the Control Plane, managing the Pod lifecycle on the node.
Kube-proxy , maintaining node network rules, and realizing service discovery and load balancing.
The container runtime , such as Docker, containerd, is responsible for pulling the mirror image and running the container.
3. Pod minimum deployment unit
A Pod contains one or more containers (e.g., a master container + Sidecar containers), shares network and storage resources.
Analogy Pod is like a "work team", where the container is a member of the team, collaborating to complete the task.
The Kubernetes cluster resource dynamic management method provided by the embodiment of the invention is executed by computer equipment, and correspondingly, the Kubernetes cluster resource dynamic management system runs in the computer equipment.
FIG. 1 is a schematic flow chart of a method of one embodiment of the invention. The execution body of fig. 1 may be a Kubernetes cluster resource dynamic management system. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.
As shown in fig. 1, the method includes:
s1, monitoring hardware resources of physical nodes of the cluster.
In a Linux system, a top command is used for checking the CPU service condition in real time, free-m is used for checking the memory usage, iostat is used for monitoring the disk I/O, and if config or ipaddr is used for checking the network interface state. Timing scripts may be written, such as using a cron task, to execute these commands at intervals and save the output to a log file.
And the SNMP protocol is to deploy an SNMP agent on the physical node, and send a request to acquire hardware resource information of the node, such as CPU utilization rate, memory utilization rate, disk space and the like, through an SNMP management station (such as Nagios, zabbix and the like).
Hardware monitoring interface, for server level hardware, IPMI (intelligent platform management interface) protocol can be utilized to obtain hardware sensor data including temperature, fan rotation speed and the like through an IPMI tool (such as ipmitool).
The collected hardware resource data is stored in a time series database, such as Prometheus, influxDB. These databases are dedicated to processing time series data, supporting efficient storage and querying.
S2, monitoring load data of the cluster, and generating a predicted load based on the load data.
Load data monitoring:
The container arrangement platform, if Kubernetes is used as the container arrangement platform, can collect load data of Pod and Node in the cluster through Kube-state-metrics and Node-exporter, including CPU utilization rate, memory utilization rate, network traffic and the like.
Application monitoring-integrating monitoring clients (e.g., prometaus client libraries) in an application, collecting application-specific load metrics such as request response time, throughput, etc.
Log analysis application logs are collected and analyzed by ELK Stack (ELASTICSEARCH, LOGSTASH, KIBANA) or EFK Stack (ELASTICSEARCH, FLUENTD, KIBANA) from which load related information such as error rates, traffic volume, etc. are extracted.
Predictive load generation:
Machine learning algorithms the historical load data is trained using machine learning algorithms such as ARIMA (autoregressive integral moving average model), LSTM (long short term memory network) and the like. The historical load data is divided into a training set and a test set, the training set is used to train the model, and then the test set is used to evaluate the accuracy of the model.
Model training and tuning machine learning models are implemented using a machine learning library of Python (e.g., scikit-learn, tensorFlow, pyTorch). The prediction accuracy of the model is improved by adjusting the super parameters (such as learning rate, hidden layer number and the like) of the model.
And (3) predicting in real time, namely deploying the trained model into a production environment, receiving the latest load data in real time, and generating a predicted load by using the model.
S3, dynamically adjusting the resource quota of the naming space or creating a new naming space according to the pre-configuration rule and the hardware resource.
Preconfiguration rule definition:
Configuration files-configuration files using YAML or JSON formats define rules, e.g., setting different namespace resource quota policies according to different time periods, hardware resource utilization, and predicted loads.
Rules engine-the resource quota of the namespace is dynamically adjusted according to hardware resources and predicted loads using a rules engine (e.g., drools) to implement complex rule logic.
Resource quota adjustment:
Kubernetes API-using API clients of Kubernetes (e.g., kubernetes-client library of Python), resource quota of namespaces, such as CPU requests and restrictions, memory requests and restrictions, etc., are dynamically adjusted according to pre-configuration rules and hardware resource information.
The automatic expansion strategy is realized by increasing the resource quota of the name space when the hardware resources are sufficient and the predicted load is increased, and decreasing the resource quota of the name space when the hardware resources are tense and the predicted load is reduced.
New namespace creation:
Resource assessment-based on hardware resources and predicted loads, assessing whether a new namespace needs to be created. If the existing namespaces cannot meet the requirements of the predicted load and the hardware resources allow, a new namespace is created.
Namespace creation-creating a new namespace using the Kubernetes API and allocating an initial resource quota for the new namespace according to the pre-configured rules.
S4, based on the predicted load and the naming space, the container group copy number is automatically expanded through the HPA.
HPA configuration:
index definition in Kubernetes, horizontalPodAutoscaler (HPA) resource objects are used to define an index for auto-expansion, such as CPU utilization, memory utilization, custom index (e.g., request response time of an application), etc.
The target value is set such that a target value is set for each index and the HPA automatically adjusts the number of copies of the container group when the actual index value exceeds or falls below the target value.
Automatically expanding the number of container group copies:
HPA controller the HPA controller of Kubernetes can periodically monitor the index value and automatically regulate the number of copies of the container group according to the comparison result of the index value and the target value. When the predicted load increases, the HPA increases the number of copies of the container group to meet the load demand, and when the predicted load decreases, the HPA decreases the number of copies of the container group to save resources.
Copy number adjustment policy-adjustment policies of HPA, such as maximum and minimum copy number adjusted, time interval adjusted, etc., may be set to avoid frequent expansion and contraction operations.
Adjusting container resource limits:
Dynamic resource allocation, in addition to adjusting the number of copies of the container group, the resource limits (such as CPU limits and memory limits) of the container can be dynamically adjusted according to the predicted load and the resource quota of the namespace. The resource request and limit fields of the container may be updated using the Kubernetes API.
Resource optimization, namely, by dynamically adjusting the resource limit of the container, the container can reasonably utilize the resource of the name space and improve the utilization rate of the resource while meeting the load demand.
At present, the static configuration mode of the Kubernetes cluster resource cannot meet the high concurrency load scene, so that a dynamic resource pool is constructed:
1. Converting the hardware resources of the clusters into computing units which can be identified by the container groups, wherein the computing units form a dynamic resource pool;
and collecting hardware information, namely collecting hardware detailed information from cluster nodes, wherein the hardware detailed information comprises parameters such as CPU model, core number, frequency, GPU manufacturer, video memory capacity, CUDA core number, memory type, capacity, bandwidth and the like.
And (3) equivalent calculation, namely defining a unified calculation formula according to the hardware type and manufacturer characteristics. For example:
CPU equivalent unit, core number×dominant frequency (GHz). Times.vendor coefficient (e.g. Intel is 1.2 times).
GPU equivalent unit, video memory capacity (GB). Times.CUDA core number X manufacturer coefficient (for example, NVIDIA is 1.5 times).
Memory equivalent unit, capacity (GB) by bandwidth (MHz) by type coefficient (e.g., DDR4 is 1.0 times).
Splitting the node total resources layer by layer according to a binary tree structure, and equally dividing the resources into a left part and a right part each time until the node total resources cannot be split continuously. The final leaf node represents the smallest allocatable combination of computing units.
And adding an identification tag for the node according to the type and the number of the computing units contained in the node. For example, a node contains a 32-core CPU, 1 GPU, and 128GB of memory, then the labels are "cpu=32; cpu=1; mem=128 GB".
2. And establishing a mapping relation between the container group and the computing unit according to the computing unit allocated to the container group.
Each container group is dynamically bound to the corresponding physical node resource according to the requested equivalent computing unit number. For example, a container group applies for 2 CPU units and 0.5 GPU units, and the system will allocate matching combinations of computing units from the available resources. The maintenance mapping table records the corresponding relation between the container group and the computing unit, including the request quantity, the actual allocation quantity and the hardware affinity rule.
In combination with resource availability, node priority and network delay, an adaptation score for the container group schedule to each node is calculated. For example, the more available resources, the lower the node load, the higher the node adaptation score with less network delay. Cluster resource pressure is monitored, and when node resource utilization is detected to exceed a threshold (e.g., 85%), container group migration is triggered, and computing units are reassigned to balance the load.
3. And establishing access control topology of the user information, the container group and the computing unit according to the user information corresponding to the container group.
The access rights to the computing unit are defined according to the user roles (e.g., administrator, developer) and the belonging projects. For example, a user may only be allowed to use a compute unit of GPU type a 100. And verifying whether the user request accords with constraint conditions such as resource quota, hardware type, time window and the like in real time. For example, limiting an item to only 50% of the computing resources during non-working hours.
By means of the combination of namespaces and tags, it is ensured that the computing units of different users or items are not used to interfere with each other. For example, an independent namespace is created for financial transactions, associating only encrypted GPU computing units. All resource allocation, release and rights change operations, including operators, time, target resources and execution results, are recorded for compliance review and fault tracing.
In one embodiment of the invention, based on step S1, a possible embodiment thereof will be given below as a non-limiting illustration.
The node resource configuration information acquisition mode comprises the following steps:
The capacity of the node CPU is status.
Node memory capacity, status.
The node GPU cards are status.capability [ "nvidia.com/GPU" ];
The node stores the allocable capacity, namely status.allocation [ "ephemeral-storage" ];
Node pod capacity: status.
In one embodiment of the present invention, based on step S2, a possible embodiment thereof will be given below as a non-limiting illustration.
S201, periodically monitoring load data of each container group, and arranging the load data into load time sequence data according to monitoring time.
If Kubernetes is used as the container orchestration platform, kube-state-metrics and Node-exporter can be used to collect load data for a group of containers (Pod). Kube-state-metrics may provide state information about Kubernetes resource objects (e.g., pod, deployment, etc.), while Node-exporter is used to collect system metrics at the Node level.
Prometheus is an open source monitoring and alarm tool that can be seamlessly integrated with Kubernetes. By deploying Prometheus and related Exporter in the Kubernetes cluster, load data for a group of containers, such as CPU usage, memory usage, network traffic, etc., can be collected periodically.
And determining proper data acquisition frequency according to actual requirements and system performance. For example, the system with high real-time requirements can be set to collect once every minute, and the system with low real-time requirements can collect once every 5 minutes or 10 minutes. The data acquisition frequency is configured using the parameter of promethaus.
The collected load data is stored in a time series database, such as Prometaus' own store or InfluxDB. These databases are dedicated to processing time series data, supporting efficient storage and querying. In Prometaus, the data is stored in the form of time stamps and index values, facilitating subsequent chronological order.
And inquiring the load data of each container group from the time sequence database, and sequencing according to the monitoring time to form load time sequence data. The ranking may be implemented using a query function of the database, such as using range functions and sort operations in Prometaus.
S202, inputting the load time sequence data into a pre-training LSTM model to obtain the predicted load of the corresponding container group.
The load time sequence data is normalized, and common methods are Min-Max normalization and Z-Score normalization. Normalization can scale the data to a fixed range, helping to improve the training effect and convergence speed of the LSTM model.
The load timing data is divided into an input sequence and a target sequence. For example, the data of the first n time steps is selected as input, and the data of the next time step is targeted. Sequence partitioning may be implemented using the numpy library of Python.
The pre-trained LSTM model is loaded. A deep learning framework (e.g., tensorFlow, pyTorch) may be used to save and load the model.
And inputting the preprocessed load time sequence data into the LSTM model to obtain the predicted load of the corresponding container group.
LSTM (long and short term memory network) is a special Recurrent Neural Network (RNN) that is capable of handling long-term dependencies in sequence data. In processing load time series data, the LSTM model learns patterns and trends in the data according to the input historical load data. The model continuously updates its internal state (i.e., memory cell) through the input of multiple time steps, thereby being able to capture the long-term dependency information in the data. During prediction, the model outputs a predicted load value of the next time step according to the current input sequence and the internal state. By continually inputting new input sequences into the model, a continuous prediction of future loads can be achieved.
In one embodiment of the present invention, based on step S3, a possible embodiment thereof will be given below as a non-limiting illustration.
S301, determining resource requirements according to the predicted load.
And recording the corresponding relation between the historical load and the actual resource consumption, and constructing a feature library. A polynomial regression or neural network is used to fit the load-resource relationship curve.
And determining the resource demand according to the load-resource relation curve and the predicted load.
S302, when the resource requirement can be met by the cluster residual capacity, the existing namespaces resource quota is adjusted according to the pre-configuration rule and the hardware resource.
The pre-configuration rules include:
CPU, the input value must be in the sum of CPU capacity status.capacity.cpu of 1-all authorized nodes;
The default CPU of the container group is that the input value must be at the maximum value of CPU capacity status.capability.cpu of 1-all authorized nodes and less than the maximum value of the above naming space CPU;
Memory, namely the sum of the memory capacity status.capability.memory of all the authorized nodes with the input value of 1;
The default memory of the container group is that the input value must be in the maximum value of memory capacity status.capability.memory of 1-all authorized nodes and smaller than the maximum value of the memory of the above naming space;
Storing the sum of the storage capacity status.allocation capable [ "ephemeral-storage" ] of the input value which must be 1-all the authorized nodes;
Volume number, input value must be 1-all authorized nodes number is 23;
the number of the container groups, namely the sum of the input value of status.capability.points of all the authorized nodes is 1;
GPU card the input value must be at 1-the sum of all authorized nodes status.
The method comprises the steps of inputting a name space, wherein the name space is not limited if the name space is not input, and dynamically adjusting the resource quota of the name space or creating a new name space according to a pre-configuration rule and hardware resources.
The rule system realizes the accurate control of resource allocation through a double-layer constraint model (node physical capacity+namespace quota), and effectively balances the resource utilization rate and the system stability requirement.
S303, when the resource requirement exceeds the cluster residual capacity or forced isolation is needed, a new naming space is created.
The existing namespace has reached a quota hard upper limit (e.g., physical node total resources), quota expansion is prohibited and a new namespace must be created.
In one embodiment of the present invention, based on step S4, a possible embodiment thereof will be given below as a non-limiting illustration.
Calculating the optimal number of copies of the container group:
Wherein X is the number of computing units allocated by the namespaces, Y (t) is the number of real-time computing units of the cluster, Z (t) is the predicted load, For a hard limiting weight,For the weight of the real-time resource,Is a historical load weight.
And expanding the copy number of the corresponding container group to the optimal copy number.
In Kubernetes, horizontalPodAutoscaler (HPA) resource objects are configured, defining metrics (e.g., CPU utilization, memory utilization, etc.) and target values for automatic expansion. The HPA controller of Kubernetes periodically monitors the index value and automatically adjusts the number of copies of the container group based on the comparison of the index value to the target value. When the calculated optimal cost is different from the current cost, the HPA can automatically adjust. Meanwhile, the resource limitation of the container may also be adjusted by adjusting the target value of the HPA or the resource request and limitation fields. The HPA monitors a specific indicator of the container group (e.g., CPU utilization) and compares it with a preset target value. When the index value exceeds or falls below the target value, the HPA automatically adjusts the number of copies of the container group to ensure that the resource usage of the container group can meet the load demand, while avoiding waste of resources. After the optimal number of copies is calculated, the HPA dynamically adjusts the number of copies to achieve the optimal configuration according to the result and the current monitoring index condition. Meanwhile, the resource limitation of the container can be dynamically adjusted by adjusting the related parameters of the HPA, so that the resource utilization is further optimized.
In one example, an application instance/workload is created, edited, and the container group copy number and container quota maximum coordinates are dynamically adjusted based on a namespace quota versus node allocable resources.
1) Node-assignable resource acquisition mode:
Nodes may allocate the number of pods: allocated resources.
The node may allocate a CPU request: allocatedresource, CPU capacity-allocatedresource, CPU requests;
the node may allocate CPU limitations of allowances resource.CPU capability-allowances resource.CPU limits;
Nodes may allocate memory requests of allocatedresource.
Nodes may allocate memory limitations of allocatedresource, memory capacity-allocatedresource, memory limits;
the node may allocate GPU cards, status.
2) Dynamically setting the number of workload copies and resource quotas when issuing the workload:
CPU requests, namely comparing the namespaces CPU quota with (the maximum value of the CPU can be allocated to the authorized node), taking smaller values of the namespaces CPU quota and setting the namespaces CPU quota as the maximum value of the CPU can be allocated to the authorized node if the namespaces CPU quota is not configured;
The maximum value of the coordinates is linked with the CPU request, and the same CPU limit is set when the CPU request is set, and the CPU limit can be changed to be larger than the CPU request value;
The memory request is that the name space memory quota is compared with (the maximum value of the memory which can be allocated by the authorized node) and the smaller value of the name space memory quota and the maximum value of the memory which can be allocated by the authorized node is set if the name space is not allocated with the memory quota;
The memory limit is that the maximum value of the coordinates is linked with the memory request, and the same memory limit is set when the memory request is set, and the memory limit can be changed to be larger than the memory request value;
GPU restriction, namely comparing the name space GPU quota with (the authorized node can allocate the GPU maximum value), taking smaller values of the name space GPU quota and setting the name space GPU quota as the authorized node can allocate the GPU maximum value if the name space is not configured with the GPU quota.
The namespace resource quota is dynamically updated as the cluster adds or deletes nodes. If the node is moved out, the node is prompted to dynamically update [ XXXX1, XXXX2] namespaces resource quota.
Referring to fig. 2, in one embodiment, the method specifically includes the following steps:
namespace resource management module:
When creating and editing the name space, dynamically setting the name space resource quota and quota according to the resource capacity of the cluster nodes.
1) The node resource configuration information acquisition mode comprises the following steps:
The capacity of the node CPU is status.
Node memory capacity, status.
The node GPU cards are status.capability [ "nvidia.com/GPU" ];
The node stores the allocable capacity, namely status.allocation [ "ephemeral-storage" ];
Node pod capacity: status.
2) Namespace quota configuration:
CPU, the input value must be in the sum of CPU capacity status.capacity.cpu of 1-all authorized nodes;
The default CPU of the container group is that the input value must be at the maximum value of CPU capacity status.capability.cpu of 1-all authorized nodes and less than the maximum value of the above naming space CPU;
Memory, namely the sum of the memory capacity status.capability.memory of all the authorized nodes with the input value of 1;
The default memory of the container group is that the input value must be in the maximum value of memory capacity status.capability.memory of 1-all authorized nodes and smaller than the maximum value of the memory of the above naming space;
Storing the sum of the storage capacity status.allocation capable [ "ephemeral-storage" ] of the input value which must be 1-all the authorized nodes;
Volume number, input value must be 1-all authorized nodes number is 23;
the number of the container groups, namely the sum of the input value of status.capability.points of all the authorized nodes is 1;
GPU card the input value must be at 1-the sum of all authorized nodes status.
A workload issuing module:
And creating and editing an application instance/workload, and dynamically adjusting the number of container group copies and the maximum value coordinates of the container quota according to the comparison between the namespace quota and the node assignable resources.
1) Node-assignable resource acquisition mode:
Nodes may allocate the number of pods: allocated resources.
The node may allocate a CPU request: allocatedresource, CPU capacity-allocatedresource, CPU requests;
the node may allocate CPU limitations of allowances resource.CPU capability-allowances resource.CPU limits;
Nodes may allocate memory requests of allocatedresource.
Nodes may allocate memory limitations of allocatedresource, memory capacity-allocatedresource, memory limits;
the node may allocate GPU cards, status.
2) Dynamically setting the number of workload copies and resource quotas when issuing the workload:
the cost number is the sum of the quota of the number of the naming space container group and the assignable pod number of the authorized node, and the sum is smaller;
If the naming space has a default CPU and a default memory for setting the container group, the CPU and the memory request are automatically set to default values.
CPU requests, namely comparing the namespaces CPU quota with (the maximum value of the CPU can be allocated to the authorized node), taking smaller values of the namespaces CPU quota and setting the namespaces CPU quota as the maximum value of the CPU can be allocated to the authorized node if the namespaces CPU quota is not configured;
The maximum value of the coordinates is linked with the CPU request, and the same CPU limit is set when the CPU request is set, and the CPU limit can be changed to be larger than the CPU request value;
The memory request is that the name space memory quota is compared with (the maximum value of the memory which can be allocated by the authorized node) and the smaller value of the name space memory quota and the maximum value of the memory which can be allocated by the authorized node is set if the name space is not allocated with the memory quota;
The memory limit is that the maximum value of the coordinates is linked with the memory request, and the same memory limit is set when the memory request is set, and the memory limit can be changed to be larger than the memory request value;
GPU restriction, namely comparing the name space GPU quota with (the authorized node can allocate the GPU maximum value), taking smaller values of the name space GPU quota and setting the name space GPU quota as the authorized node can allocate the GPU maximum value if the name space is not configured with the GPU quota.
And a resource dynamic synchronization module:
The namespace resource quota is dynamically updated as the cluster adds or deletes nodes. If the node is moved out, the node is prompted to dynamically update [ XXXX1, XXXX2] namespaces resource quota.
In another embodiment of the present invention, a method for dynamically managing Kubernetes cluster resources is provided, including:
And 1, hardware resource monitoring and dynamic resource pool construction. And acquiring hardware information of all physical nodes through a Kubernetes API, wherein the hardware information comprises a CPU model (such as Intel Xeon 8358), a GPU manufacturer (such as NVIDIA A100), a memory type (DDR 4) and a storage capacity (such as 1TB NVMe SSD).
Equivalent calculation unit definition:
CPU unit number x dominant frequency (GHz) x vendor coefficients (intel=1.2, amd=1.0).
GPU unit, video memory (GB). Times.CUDA core number. Times.vendor coefficient (NVIDIA=1.5).
Binary tree partitioning strategy-recursively decomposing node resources according to a binary tree (for example, the total CPU of the node is 32 cores, and the node is partitioned into a 16-core left subtree and a 16-core right subtree for the first time until leaf nodes are 1 core units).
The logical label marks that labels computer-unit are added to the nodes, wherein the labels are cpu=32, cpu=2 and mem=128 GB, and the types and the numbers of the computing units contained in the nodes are identified.
And 2, load prediction and resource demand calculation.
And load time sequence data acquisition, namely acquiring the CPU utilization rate, the memory occupation amount and the network IO of each container group every 15 seconds, and storing the CPU utilization rate, the memory occupation amount and the network IO into InfluxDB, wherein the data structure is as follows:
Timestamp 2023-09-20T14:30:00;Pod ID:pod-abc-123, CPU (%) 68.7, memory (GB): 4.1.
LSTM prediction model processing:
model input load data for the past 60 time steps (15 minutes) normalized to the [0,1] interval.
And (3) predicting and outputting a load value Z (t) for 5 minutes in the future, and converting the load value Z (t) into a physical resource demand after inverse normalization (for example, predicting the CPU demand to be 8 cores).
And 3, dynamically adjusting the name space.
Resource margin calculation, cluster real-time available resource Y (t) =total resource-allocated resource (if total CPU core number 100, used 60, then Y (t) =40).
Decision logic:
If Y (t) is not less than Z (t) multiplied by 1.2 (buffer coefficient), the existing namespace quota is adjusted (e.g. the CPU quota of namespace A is increased from 20 cores to 30 cores).
If Y (t) < Z (t) ×0.8 and isolation is needed (such as a gold fusion rule scenario), a new namespace is created and dedicated node resources are bound.
Pre-configuration rule enforcement:
Input constraints:
CPU request value range 1 is less than or equal to input value is less than or equal to the sum of all node CPUs (such as 100 cores).
The container set default value is a maximum value of a smaller of a single node CPU capacity (e.g., 32 cores) and a namespace upper bound (e.g., 50 cores).
And 4, elastically expanding and contracting the volume of the container group. Calculating the optimal number of copies:
Wherein X is the number of computing units allocated by the namespaces, Y (t) is the number of real-time computing units of the cluster, Z (t) is the predicted load, For a hard limiting weight,For the weight of the real-time resource,Is a historical load weight.
The result of calculation is OptimalReplicas =floor [ (50×0.4+40×0.3+60×0.3)/1.0 ] =floor [50] =50 copies.
The HPA strategy is performed with a 10% step up (30→33→36.+ -.) if the current number of copies is 30 and OptimalReplicas =50. Locking for 400 seconds after capacity expansion, and preventing repeated operation due to index fluctuation.
And 5, access control and audit.
User-Pod-computation unit map-Pod created by user Alice in the namespace finish only allows access to GPU computation units whose tags contain security: encrypted.
And the dynamic policy engine is used for checking whether the name space to which the user belongs has the RBAC authority of the GPU-access-true when the user requests the GPU resource.
In some embodiments, the Kubernetes cluster resource dynamic management system may include a plurality of functional modules comprised of computer program segments. The computer program of each program segment in the Kubernetes cluster resource dynamic management system may be stored in a memory of a computer device and executed by at least one processor to perform (see fig. 1 for details) the functions of Kubernetes cluster resource dynamic management.
In this embodiment, the Kubernetes cluster resource dynamic management system may be divided into a plurality of functional modules according to the functions executed by the system, as shown in fig. 3. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The monitoring module is used for monitoring hardware resources of the cluster physical nodes;
the prediction module is used for monitoring load data of the cluster and generating a predicted load based on the load data;
the configuration module is used for dynamically adjusting the resource quota of the name space or creating a new name space according to the pre-configuration rule and the hardware resource;
And the adjusting module is used for automatically expanding the container group copy number through the HPA based on the predicted load and the naming space.
Fig. 4 is a schematic diagram of a Kubernetes cluster resource dynamic management method according to an embodiment of the present application, which may be applied to a device. It will be appreciated by those skilled in the art that the structure of the apparatus according to the embodiments of the present application is not limited to the apparatus, and the apparatus may include more or less components than those illustrated, or may be combined with some components, or may be arranged with different components. In embodiments of the present application, devices include, but are not limited to, laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The apparatus may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the embodiments of the application described and/or claimed herein.
The device 400 may include, among other things, a processor 410, a memory 420, and a communication unit 430. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.
Wherein the memory 420 may be used to store the execution instructions of the processor 410, the memory 420 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. The execution of the instructions in memory 420, when executed by processor 410, enables apparatus 400 to perform some or all of the steps in the method embodiments described below.
Processor 410 is a control center of the storage device, connects various portions of the overall electronic device using various interfaces and lines, and performs various functions of the electronic device and/or processes data by running or executing software programs and/or modules stored in memory 420, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (INTEGRATED CIRCUIT, simply referred to as an IC), for example, a single packaged IC, or may be comprised of multiple packaged ICs connected to one another for the same function or for different functions. For example, the processor 410 may include only a central processing unit (Central Processing Unit, CPU for short). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.
And a communication unit 430, configured to establish a communication channel, so that the storage device may communicate with other devices. Receiving user data sent by other devices or sending user data to other devices.
The present invention also provides a computer storage medium in which a program may be stored, which program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), or the like.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or what contributes to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media that can store program codes, including several instructions to cause a computer device (which may be a personal computer, a server, or a second device, a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.
The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, as far as reference is made to the description in the method embodiments.
In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some interface, indirect coupling or communication connection of systems or modules, electrical, mechanical, or other form.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims.