CN118838719A - Distributed computing load balancing method and system - Google Patents
Distributed computing load balancing method and system Download PDFInfo
- Publication number
- CN118838719A CN118838719A CN202411310655.9A CN202411310655A CN118838719A CN 118838719 A CN118838719 A CN 118838719A CN 202411310655 A CN202411310655 A CN 202411310655A CN 118838719 A CN118838719 A CN 118838719A
- Authority
- CN
- China
- Prior art keywords
- computing
- estimated
- statement
- distributed computing
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical Field
本说明书的多个实施例涉及信息技术领域,具体涉及一种分布式计算负载均衡方法及系统。Multiple embodiments of this specification relate to the field of information technology, and specifically to a distributed computing load balancing method and system.
背景技术Background Art
分布式计算是指将计算任务分解并在多个计算机(节点)上并行执行的过程。这些节点通过网络相互连接,并共同协作来完成一个或多个复杂的任务。分布式计算具并行处理、资源共享、容错性强、可扩展性高、存在异构性的特点。负载均衡是在分布式计算环境中将工作负载分配给不同的节点,以确保没有一个节点过载而其他节点空闲的情况发生。负载均衡对于保持系统的稳定性和响应性至关重要。负载均衡技术的目标是优化资源使用、最大化吞吐量、最小化响应时间,并避免任何节点过载任何。常见的负载均衡策略包括轮询(Round Robin)、最少连接(Least Connections)、基于权重(Weighted)、地理位置(Geolocation)、内容感知(Content-Based)。其中内容感知是根据请求的内容类型或特定参数将请求转发到特定节点。负载均衡在现代互联网应用和服务中非常常见,尤其是在需要高可用性和可扩展性的场景下,例如电子商务网站、社交媒体平台和在线游戏等。但目前的负载均衡技术在需要根据数据库查询结果进行逐条处理时,由于数据库查询结果的行数具有不确定行,导致具体计算量存在不可预测性,可能会导致节点过载的情况。因此需要对负载均衡技术进行改进。Distributed computing refers to the process of breaking down computing tasks and executing them in parallel on multiple computers (nodes). These nodes are connected to each other through a network and work together to complete one or more complex tasks. Distributed computing is characterized by parallel processing, resource sharing, strong fault tolerance, high scalability, and heterogeneity. Load balancing is the distribution of workloads to different nodes in a distributed computing environment to ensure that no node is overloaded while other nodes are idle. Load balancing is essential to maintaining system stability and responsiveness. The goal of load balancing technology is to optimize resource usage, maximize throughput, minimize response time, and avoid overloading any node. Common load balancing strategies include Round Robin, Least Connections, Weighted, Geolocation, and Content-Based. Content-aware forwards requests to specific nodes based on the content type or specific parameters of the request. Load balancing is very common in modern Internet applications and services, especially in scenarios that require high availability and scalability, such as e-commerce websites, social media platforms, and online games. However, when the current load balancing technology needs to process the database query results one by one, the number of rows in the database query results is uncertain, resulting in unpredictable specific calculation amount, which may cause node overload. Therefore, it is necessary to improve the load balancing technology.
发明内容Summary of the invention
本说明书多个实施例描述了一种分布式计算负载均衡方法及系统。Multiple embodiments of this specification describe a distributed computing load balancing method and system.
第一方面,本说明书实施例提供了一种分布式计算负载均衡方法,包括步骤:In a first aspect, an embodiment of this specification provides a distributed computing load balancing method, comprising the steps of:
接收目标任务划分出的多个可并行执行的计算任务,读取所述计算任务的代码;Receiving a plurality of computing tasks that can be executed in parallel divided from the target task, and reading the codes of the computing tasks;
识别所述代码中的数据库查询语句,执行所述数据库查询语句,获得结果数据;Identify a database query statement in the code, execute the database query statement, and obtain result data;
根据所述结果数据获得所述计算任务的预估负荷,所述预估负荷包括预估耗时、预估内存空间占用和预估CPU资源占用;Obtaining an estimated load of the computing task according to the result data, the estimated load including an estimated time consumption, an estimated memory space occupancy, and an estimated CPU resource occupancy;
读取分布式计算节点的节点负荷,所述节点负荷包括当前排队等待预估时长、内存空间占用及CPU资源占用;Read the node load of the distributed computing node, where the node load includes the current estimated waiting time in the queue, memory space usage, and CPU resource usage;
根据所述预估负荷及所述节点负荷,获得每个所述计算任务适配的全部分布式计算节点,以及所述计算任务的预估完成时刻;According to the estimated load and the node load, obtain all distributed computing nodes adapted for each computing task and an estimated completion time of the computing task;
根据预估完成时刻获得每个计算任务的指定分布式计算节点,使分布式计算节点完成被指定的全部计算任务的预估完成时刻的时间差最小。The designated distributed computing node for each computing task is obtained according to the estimated completion time, so that the time difference of the estimated completion time of the distributed computing node to complete all the designated computing tasks is minimized.
第二方面,本说明书实施例提供了一种分布式计算负载均衡系统,包括:In a second aspect, the embodiments of this specification provide a distributed computing load balancing system, including:
接收模块,接收目标任务划分出的多个可并行执行的计算任务,读取所述计算任务的代码;A receiving module receives a plurality of computing tasks that can be executed in parallel divided from a target task, and reads the codes of the computing tasks;
查询模块,识别所述代码中的数据库查询语句,执行所述数据库查询语句,获得结果数据;A query module, which identifies a database query statement in the code, executes the database query statement, and obtains result data;
预估模块,根据所述结果数据获得所述计算任务的预估负荷,所述预估负荷包括预估耗时、预估内存空间占用和预估CPU资源占用;An estimation module, which obtains an estimated load of the computing task according to the result data, wherein the estimated load includes an estimated time consumption, an estimated memory space occupancy, and an estimated CPU resource occupancy;
读取模块,读取分布式计算节点的节点负荷,所述节点负荷包括当前排队等待预估时长、内存空间占用及CPU资源占用;A reading module reads the node load of the distributed computing node, wherein the node load includes the current estimated waiting time in the queue, the memory space occupied, and the CPU resource occupied;
适配模块,根据所述预估负荷及所述节点负荷,获得每个所述计算任务适配的全部分布式计算节点,以及所述计算任务的预估完成时刻;An adaptation module, which obtains all distributed computing nodes adapted for each computing task and an estimated completion time of the computing task according to the estimated load and the node load;
指定模块,根据预估完成时刻获得每个计算任务的指定分布式计算节点,使所述目标任务对应的全部计算任务的预估完成时刻的时间差最小。The designated module obtains the designated distributed computing node of each computing task according to the estimated completion time, so as to minimize the time difference of the estimated completion time of all computing tasks corresponding to the target task.
第三方面,本说明书实施例提供了电子设备,包括处理器以及存储器;In a third aspect, an embodiment of this specification provides an electronic device, including a processor and a memory;
所述处理器与所述存储器相连;The processor is connected to the memory;
所述存储器,用于存储可执行程序代码;The memory is used to store executable program code;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行上述任一方面所述的方法。The processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to execute the method described in any one of the above aspects.
第四方面,本说明书实施例提供了计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一方面所述的方法。In a fourth aspect, an embodiment of the present specification provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method described in any of the above aspects.
第五方面,本说明书实施例提供了计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述任一方面所述的方法。In a fifth aspect, an embodiment of this specification provides a computer program product, including a computer program, which implements the method described in any of the above aspects when executed by a processor.
本说明书一些实施例提供的技术方案带来的有益效果至少包括:The beneficial effects brought by the technical solutions provided by some embodiments of this specification include at least:
在本说明书多个实施例中,提供的分布式计算负载均衡方法将计算任务中的数据库查询语句在指定的服务器上前置执行,从而能够根据结果数据的条数,相对更为准确的预估计算任务的预估负荷,进而帮助使分布式计算负载更为均衡的进行调度。通过读取分布式计算节点的节点负荷,结合计算任务的预估负荷,使分布式计算节点完成被指定的全部计算任务的预估完成时刻的时间差最小,从而能够使计算任务能够更快的进行融合,获得目标任务的结果,提高分布式计算的效率。在指定的服务器上将数据库查询语句前置执行,从而避免了分布式节点频繁的与服务器建立连接的情况,减轻了数据库压力,有助于提高数据库的执行效率。In multiple embodiments of the present specification, the provided distributed computing load balancing method pre-executes the database query statements in the computing task on the designated server, so that the estimated load of the computing task can be relatively more accurately estimated based on the number of result data, thereby helping to schedule the distributed computing load more evenly. By reading the node load of the distributed computing node and combining the estimated load of the computing task, the time difference of the estimated completion time of the distributed computing node to complete all the designated computing tasks is minimized, so that the computing tasks can be integrated faster, the results of the target tasks can be obtained, and the efficiency of distributed computing can be improved. The database query statement is pre-executed on the designated server, thereby avoiding the situation where the distributed node frequently establishes a connection with the server, reducing the pressure on the database, and helping to improve the execution efficiency of the database.
本说明书多个实施例的其他特点和优点将会在下面的具体实施方式、附图中进一步揭示。Other features and advantages of the various embodiments of the present specification will be further disclosed in the following detailed description and drawings.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本说明书实施例中的技术方案,下面将对实施例中所需使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of this specification, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this specification. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.
图1为本说明书实施例提供的分布式计算负载均衡方法应用场景示意图。FIG1 is a schematic diagram of an application scenario of a distributed computing load balancing method provided in an embodiment of this specification.
图2为本说明书实施例提供的分布式计算负载均衡方法流程示意图。FIG2 is a flow chart of a distributed computing load balancing method provided in an embodiment of this specification.
图3为本说明书实施例提供的融合数据库查询语句方法流程示意图。FIG3 is a flow chart of a method for fusing database query statements provided in an embodiment of this specification.
图4为本说明书实施例提供的查询条件融合方法流程示意图。FIG4 is a flow chart of a query condition fusion method provided in an embodiment of the present specification.
图5为本说明书实施例提供的融合数值条件方法流程示意图。FIG5 is a schematic diagram of a flow chart of a method for fusing numerical conditions provided in an embodiment of this specification.
图6为本说明书实施例提供的预估耗时方法流程示意图。FIG. 6 is a schematic diagram of a flow chart of a method for estimating time consumption provided in an embodiment of this specification.
图7为本说明书实施例提供的预估内存空间占用方法流程示意图。FIG. 7 is a schematic diagram of a flow chart of a method for estimating memory space occupancy provided in an embodiment of this specification.
图8为本说明书实施例提供的预估CPU资源占用方法流程示意图。FIG8 is a schematic diagram of a flow chart of a method for estimating CPU resource occupancy provided in an embodiment of this specification.
图9为本说明书实施例提供的分布式计算负载均衡系统示意图。FIG. 9 is a schematic diagram of a distributed computing load balancing system provided in an embodiment of this specification.
图10为本说明书实施例提供的电子设备示意图。FIG. 10 is a schematic diagram of an electronic device provided in an embodiment of this specification.
具体实施方式DETAILED DESCRIPTION
下面结合本说明书实施例的附图对本说明书实施例的技术方案进行解释和说明,但下述实施例仅为本说明书的优选实施例,并非全部。基于实施方式中的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得其他实施例,都属于本说明书的保护范围。The following is an explanation and description of the technical solutions of the embodiments of this specification in conjunction with the drawings of the embodiments of this specification, but the following embodiments are only preferred embodiments of this specification, not all. Based on the embodiments in the implementation mode, other embodiments obtained by those skilled in the art without creative work are all within the scope of protection of this specification.
本说明书中的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", "third", etc. in the description and claims of this specification and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific order. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally includes other steps or units inherent to these processes, methods, products or devices.
在下文描述中,出现诸如术语“内”、“外”、“上”、“下”、“左”、“右”等指示方位或者位置关系仅是为了方便描述实施例和简化描述,而不是指示或暗示所指的装置或者元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本说明书的限制。In the following description, terms such as "inside", "outside", "up", "down", "left", "right", etc. that indicate directions or positional relationships are only used to facilitate the description of the embodiments and simplify the description, and do not indicate or imply that the device or element referred to must have a specific direction, be constructed and operated in a specific direction, and therefore should not be understood as a limitation of this specification.
本说明书所涉及的数据,均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的采集遵守相关国家和地区的相关法律法规和标准。The data involved in this manual are all information and data authorized by the user or fully authorized by all parties, and the collection of relevant data complies with the relevant laws, regulations and standards of relevant countries and regions.
介绍本说明书提供的技术方案前,对相关技术及应用场景作介绍。Before introducing the technical solutions provided in this specification, the relevant technologies and application scenarios are introduced.
随着互联网的普及和信息技术的发展,产生了大量数据,传统的集中式处理方式已经无法满足处理这些数据的需求。且伴随着企业竞争的加剧和服务的丰富,使企业在面对日益增长的数据量和复杂的计算任务时,需要一种能够根据需求动态扩展计算资源的能力,同时还要能够灵活应对不同的计算任务的技术。如人工智能、大数据分析等,这些应用都要求系统具备强大的计算能力。With the popularization of the Internet and the development of information technology, a large amount of data has been generated. The traditional centralized processing method can no longer meet the needs of processing this data. With the intensification of corporate competition and the enrichment of services, enterprises need a technology that can dynamically expand computing resources according to demand and flexibly respond to different computing tasks when facing the growing amount of data and complex computing tasks. For example, artificial intelligence, big data analysis, etc., these applications require the system to have strong computing capabilities.
分布式计算系统通过动态扩展计算资源,可以根据实际需求动态调整计算能力,避免在非高峰时段浪费资源。例如,企业在促销活动、节假日或其他特殊事件期间可能会面临突然激增的流量或计算需求。而动态扩展能力可以确保系统在促销活动、节假日的高峰期也能正常运行。同时分布式计算系统还具有自动的故障恢复能力,在系统出现故障或需要维护时,动态扩展还可以保证服务不中断,从而提高业务连续性和可靠性。这些优点使得分布式计算技术得到了快速的发展和领域内的广泛应用。Distributed computing systems can dynamically adjust computing power according to actual needs by dynamically expanding computing resources, avoiding wasting resources during off-peak hours. For example, during promotions, holidays, or other special events, companies may face sudden surges in traffic or computing needs. Dynamic expansion capabilities can ensure that the system can operate normally during peak periods of promotions and holidays. At the same time, distributed computing systems also have automatic fault recovery capabilities. When the system fails or requires maintenance, dynamic expansion can also ensure that services are not interrupted, thereby improving business continuity and reliability. These advantages have led to the rapid development of distributed computing technology and its widespread application in the field.
分布式计算的主要实现方式有通过云计算、容器化与微服务、自动化工具、混合云及多云技术等方式。云计算提供商(如AWS、Azure、Google Cloud等)提供了弹性计算服务,用户可以根据需求随时增加或减少计算资源。通过容器化(如Docker)和微服务架构,可以快速部署和扩展应用程序,提高资源利用效率。使用自动化工具(如Kubernetes、Ansible等)来管理和调度计算资源,实现自动扩缩容。结合公有云和私有云的优势,或者利用多个公有云服务商,根据实际需求灵活调配资源。Distributed computing is mainly implemented through cloud computing, containerization and microservices, automation tools, hybrid cloud and multi-cloud technologies. Cloud computing providers (such as AWS, Azure, Google Cloud, etc.) provide elastic computing services, and users can increase or decrease computing resources at any time according to demand. Through containerization (such as Docker) and microservice architecture, applications can be quickly deployed and expanded to improve resource utilization efficiency. Use automation tools (such as Kubernetes, Ansible, etc.) to manage and schedule computing resources to achieve automatic expansion and contraction. Combine the advantages of public and private clouds, or use multiple public cloud service providers to flexibly allocate resources according to actual needs.
分布式计算的实现需要解决数据一致性、故障恢复与容错、负载均衡、数据安全性、网络管理、数据管理等。数据一致性确保分布式系统中各个节点的数据是一致的,尤其是在进行读写操作时。故障恢复与容错需要及时发现出现故障的节点并进行隔离、恢复等处理。分布式计算中的数据采用加密手段保障数据传输和存储过程的安全性。网络管理需要应对不同地理位置节点之间的高延迟问题。数据管理将大表分成小块存储在不同的节点上并进行对应用透明的数据迁移。负载均衡的目标是使各个节点的负载尽可能均匀,避免某些节点过载而其他节点空闲的情况,这样可以最大化资源利用率,减少任务处理时间,并提高系统的可靠性和响应速度。确保计算资源被合理地分配给各个计算任务,从而提高系统的整体性能和效率。The implementation of distributed computing requires solving data consistency, fault recovery and fault tolerance, load balancing, data security, network management, data management, etc. Data consistency ensures that the data of each node in the distributed system is consistent, especially when performing read and write operations. Fault recovery and fault tolerance require timely detection of faulty nodes and isolation and recovery. Data in distributed computing uses encryption to ensure the security of data transmission and storage. Network management needs to deal with the high latency problem between nodes in different geographical locations. Data management divides large tables into small blocks and stores them on different nodes and performs data migration transparent to the application. The goal of load balancing is to make the load of each node as even as possible, avoiding the situation where some nodes are overloaded while other nodes are idle, so as to maximize resource utilization, reduce task processing time, and improve system reliability and response speed. Ensure that computing resources are reasonably allocated to each computing task, thereby improving the overall performance and efficiency of the system.
常见的负载均衡技术包括轮询、最少连接、加权轮询、加权最少连接、基于内容分配等技术。其中基于内容分配技术根据请求的内容或特定参数将请求分配给最适合的节点。其能够针对不同类型的任务进行优化分配。Common load balancing technologies include round-robin, least connections, weighted round-robin, weighted least connections, content-based allocation, etc. Among them, content-based allocation technology allocates requests to the most suitable nodes according to the content of the request or specific parameters. It can optimize the allocation for different types of tasks.
本说明书提供的分布式计算负载均衡方法即属于基于内容分配的负载均衡技术的范畴。鉴于本说明书会涉及到一些专业术语,因此,下面将对这部分专业术语先进行介绍。The distributed computing load balancing method provided in this specification belongs to the category of load balancing technology based on content distribution. Since this specification involves some professional terms, these professional terms will be introduced first below.
密集计算语句Intensive calculation statements
密集计算语句指的是那些在程序中涉及大量计算操作的代码或代码段。在科学计算、工程仿真、大数据处理、机器学习等领域中经常遇到。这类语句的特点是计算复杂度高、占用计算资源多,并且可能需要并行处理或分布式计算来提高效率。下面是部分典型的密集计算语句的例子。在数值计算中,常常需要进行矩阵运算、求解线性方程组等任务。例如,使用Python的NumPy库进行矩阵乘法,示例性的代码可以是:Intensive computing statements refer to those codes or code segments that involve a large number of computing operations in the program. They are often encountered in scientific computing, engineering simulation, big data processing, machine learning and other fields. Such statements are characterized by high computational complexity, high computing resource usage, and may require parallel processing or distributed computing to improve efficiency. The following are some typical examples of intensive computing statements. In numerical computing, tasks such as matrix operations and solving linear equations are often required. For example, using Python's NumPy library for matrix multiplication, an exemplary code can be:
import numpy as npimport numpy as np
# 创建两个随机矩阵# Create two random matrices
A = np.random.rand(1000, 1000)A = np.random.rand(1000, 1000)
B = np.random.rand(1000, 1000)B = np.random.rand(1000, 1000)
# 计算矩阵乘法# Compute matrix multiplication
C = np.dot(A, B)C = np.dot(A, B)
这段代码创建了两个1000x1000的随机矩阵,并进行了矩阵乘法运算。矩阵乘法np.dot(A, B),是一个典型的密集计算任务,特别是在大规模矩阵的情况下。This code creates two 1000x1000 random matrices and performs matrix multiplication. Matrix multiplication, np.dot(A, B), is a typical intensive computing task, especially in the case of large matrices.
此外,循环嵌套是另一种常见的密集计算形式,尤其是在处理多维数组时。Additionally, nested loops are another common form of intensive computation, especially when dealing with multidimensional arrays.
机器学习中的梯度计算是另一种常见的密集计算。在机器学习中,特别是在深度学习中,梯度下降算法需要计算损失函数相对于模型参数的梯度。这是一个密集计算过程,特别是当模型复杂且数据量大时。Gradient calculation in machine learning is another common intensive calculation. In machine learning, especially in deep learning, the gradient descent algorithm needs to calculate the gradient of the loss function with respect to the model parameters. This is an intensive calculation process, especially when the model is complex and the amount of data is large.
这些任务通常涉及大量的数学运算,如加法、乘法、矩阵运算等。处理大量数据时,可能会消耗大量的内存资源。很多密集计算任务可以通过并行处理来加速,例如使用GPU、TPU等加速硬件。为了提高效率,通常需要对算法进行优化,例如使用更高效的矩阵运算库、并行计算框架等。These tasks usually involve a large number of mathematical operations, such as addition, multiplication, matrix operations, etc. When processing large amounts of data, a large amount of memory resources may be consumed. Many intensive computing tasks can be accelerated through parallel processing, such as using acceleration hardware such as GPUs and TPUs. In order to improve efficiency, it is usually necessary to optimize the algorithm, such as using more efficient matrix operation libraries, parallel computing frameworks, etc.
负荷load
在分布式计算中,“负荷”(load)通常指的是系统中各个节点所承担的工作量或任务量。这里的“负荷”可以理解为系统资源的占用程度,包括但不限于CPU使用率、内存占用、磁盘I/O操作以及网络流量等。负荷是衡量分布式系统性能和资源利用情况的一个重要指标。为了确保分布式系统的高效运行,需要对负荷进行有效的管理。主要包括负载均衡、动态调度、资源预留与分配、弹性伸缩、监控与预警。负载均衡通过负载均衡技术,将任务合理地分配给各个节点,确保各节点的负荷相对均衡。动态调度根据实时的负荷情况动态调整任务的分配,例如将新任务分配给当前负荷较低的节点。资源预留与分配预先为关键任务预留资源,确保重要任务能够得到足够的计算资源。弹性伸缩根据负荷的变化自动调整系统的规模,例如在高负荷时期增加节点,在低负荷时期减少节点。监控与预警实时监控各节点的负荷情况,并在负荷超过阈值时发出预警,及时采取措施。In distributed computing, "load" usually refers to the workload or task volume undertaken by each node in the system. The "load" here can be understood as the degree of occupation of system resources, including but not limited to CPU usage, memory usage, disk I/O operations, and network traffic. Load is an important indicator for measuring the performance and resource utilization of distributed systems. In order to ensure the efficient operation of distributed systems, load management needs to be carried out effectively. It mainly includes load balancing, dynamic scheduling, resource reservation and allocation, elastic scaling, monitoring and early warning. Load balancing uses load balancing technology to reasonably allocate tasks to each node to ensure that the load of each node is relatively balanced. Dynamic scheduling dynamically adjusts the allocation of tasks according to the real-time load situation, such as allocating new tasks to nodes with lower current load. Resource reservation and allocation reserves resources for key tasks in advance to ensure that important tasks can get enough computing resources. Elastic scaling automatically adjusts the scale of the system according to changes in load, such as adding nodes during high-load periods and reducing nodes during low-load periods. Monitoring and early warning monitors the load of each node in real time, and issues an early warning when the load exceeds the threshold, and takes timely measures.
节点负荷Node load
在分布式计算环境中,节点负荷是指各个计算节点在执行任务时的资源使用情况。负荷管理是确保分布式系统高效运行的关键。其指标包括但不限于CPU使用率、内存占用、磁盘I/O操作以及网络流量等。当分布式计算任务30包括数据库SQL查询语句时,SQL查询本身可能会对数据库40造成一定的负荷,尤其是当查询条件较多并且需要从多个表中获取数据时。In a distributed computing environment, node load refers to the resource usage of each computing node when executing a task. Load management is the key to ensuring the efficient operation of a distributed system. Its indicators include but are not limited to CPU usage, memory usage, disk I/O operations, and network traffic. When a distributed computing task 30 includes a database SQL query statement, the SQL query itself may cause a certain load on the database 40, especially when there are many query conditions and data needs to be obtained from multiple tables.
概然分布概率Probability of probability distribution
表示对分布概率的一种近似估计,即在一定程度上包含了误差或不确定性的概率分布。因统计列的取值的分布概率的代价较高,需要进行大量的数据库查询操作,尤其对于列比较多的数据库系统,其执行效率是很低的。且本说明书提供的技术方案并不需要对分布概率的准确性作出要求。因此使用对分布概率的近似估计即可。概然分布概率可以采用通过对数据的粗略观察或是基于有限样本得到,而不需要基于大量详尽数据得出的精确分布。概然分布概率包含了一定程度的主观判断或是基于模型的预测,因此具有一定的不确定性。当数据质量不高或是数据收集过程中存在误差时,所得到的概然分布概率就会有一定的模糊性。但概率分布的误差和模糊性,基本不影响本说明书提供技术方案的实施。It represents an approximate estimate of the distribution probability, that is, a probability distribution that contains errors or uncertainties to a certain extent. Because the cost of the distribution probability of the values of the statistical column is high, a large number of database query operations are required, especially for database systems with more columns, and its execution efficiency is very low. And the technical solution provided in this specification does not require the accuracy of the distribution probability. Therefore, an approximate estimate of the distribution probability can be used. The probable distribution probability can be obtained by rough observation of the data or based on a limited sample, without the need for an accurate distribution based on a large amount of detailed data. The probable distribution probability contains a certain degree of subjective judgment or prediction based on the model, so it has a certain degree of uncertainty. When the data quality is not high or there are errors in the data collection process, the obtained probable distribution probability will have a certain degree of ambiguity. However, the errors and ambiguity of the probability distribution basically do not affect the implementation of the technical solution provided in this specification.
以较能够体现本说明书提供的分布式计算负载均衡方法优势的应用场景双十一电商促销期间商家10后台“订单发货确认”作为示例,对本说明书技术方案进行介绍。每个商家10均需要执行订单发货确认的任务,显然双十一电商促销期间商家10使用分布式计算技术实现订单的处理是有必要的,因此需要使用到负载均衡技术。在分布式计算技术实现订单的处理时,较适宜使用基于内容分配的负载均衡技术。其原因在于不同商家10发起的“订单发货确认”的分布式计算任务30,其对应的数据处理量是不同的,甚至是差别极大的。Taking the "order shipment confirmation" of the merchant 10 backstage during the Double Eleven e-commerce promotion as an example, which is an application scenario that can best reflect the advantages of the distributed computing load balancing method provided by this specification, the technical solution of this specification is introduced. Each merchant 10 needs to perform the task of order shipment confirmation. Obviously, it is necessary for the merchant 10 to use distributed computing technology to realize order processing during the Double Eleven e-commerce promotion, so load balancing technology is needed. When distributed computing technology is used to realize order processing, it is more appropriate to use content-based distribution-based load balancing technology. The reason is that the distributed computing tasks 30 of "order shipment confirmation" initiated by different merchants 10 have different corresponding data processing volumes, and even huge differences.
商家10发起的分布式计算任务30由其订单成交量决定,当商家10订单成交量大时,对应的分布式计算任务30的计算量就大、耗时更长。然而问题在于,“订单发货确认”的分布式计算任务30,对应的计算量需要查询数据库40后才能获得。另一个问题在于,商家10不一定会在发起“订单发货确认”的分布式计算任务30时,将所有待发货订单纳入到本次的分布式计算任务30。The distributed computing task 30 initiated by the merchant 10 is determined by its order volume. When the merchant 10 has a large order volume, the corresponding distributed computing task 30 has a large amount of computing and takes longer time. However, the problem is that the distributed computing task 30 of "order shipment confirmation" requires querying the database 40 to obtain the corresponding computing amount. Another problem is that the merchant 10 may not include all pending orders into this distributed computing task 30 when initiating the distributed computing task 30 of "order shipment confirmation".
示例性的,商家10可能仅希望将产品A的待发货订单纳入到本次的分布式计算任务30中,从而将产品A进行发货处理。又一示例性的,商家10可能仅将昨日上午成交的待发货订单纳入到本次分布式计算任务30中。又一示例性的,商家10可能将收货地址位于某几个选定省份的待发货订单纳入到本次分布式计算任务30中。或者结合前述的几种筛选条件,如结合产品及收货地址条件进行筛选,符合筛选条件的订单会被纳入到本次分布式计算任务30。这就导致事先建立索引的方式并不能很好的解决“订单发货确认”的分布式计算任务30难以确认计算量的问题。Exemplarily, the merchant 10 may only want to include the pending orders of product A into this distributed computing task 30, so as to process the shipment of product A. In another exemplary embodiment, the merchant 10 may only include the pending orders that were completed yesterday morning into this distributed computing task 30. In another exemplary embodiment, the merchant 10 may include the pending orders whose delivery addresses are located in several selected provinces into this distributed computing task 30. Or, in combination with the aforementioned screening conditions, such as screening based on product and delivery address conditions, orders that meet the screening conditions will be included in this distributed computing task 30. This results in the method of establishing an index in advance not being able to solve the problem of difficulty in confirming the computational amount of the distributed computing task 30 of "order shipment confirmation".
若先行查询数据库40获得发起分布式计算任务30的商家10待发货的订单数量,则能够获得较为准确的确认分布式计算任务30的计算量,但会导致数据库查询次数的增加,增大数据库系统的负载,对分布式计算系统的效率造成影响。If the database 40 is queried first to obtain the number of orders to be shipped by the merchant 10 who initiated the distributed computing task 30, a more accurate confirmation of the computing amount of the distributed computing task 30 can be obtained, but it will lead to an increase in the number of database queries, increase the load on the database system, and affect the efficiency of the distributed computing system.
为此,本说明书提供一种分布式计算负载均衡方法。请参阅附图1,为本说明书提供的分布式计算负载均衡方法应用场景框架示意图。本说明书提供的分布式计算负载均衡方法应用在一个服务器上,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content DeliveryNetwork,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。总之是能够与最终执行分布式计算任务30的分布式计算系统进行数据交互,并控制计算任务30分配的服务器。To this end, this specification provides a distributed computing load balancing method. Please refer to Figure 1, which is a schematic diagram of the application scenario framework of the distributed computing load balancing method provided in this specification. The distributed computing load balancing method provided in this specification is applied on a server, which can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and big data and artificial intelligence platforms. In short, it is able to interact with the distributed computing system that ultimately executes the distributed computing task 30 and control the server assigned to the computing task 30.
首先,商家10通过控制台20发起分布式的计算任务30,即目标任务。控制台20由分布式计算系统提供,将目标任务划分出多个可并行执行的计算任务30的技术采用本领域已经公开的技术进行即可。而后,运行有本说明书提供的分布式计算负载均衡方法的服务器接收目标任务划分出的多个可并行执行的计算任务,与数据库40建立连接,并执行数据库查询语句。根据结果数据获得计算任务的预估负荷31,结合分布式计算节点50当前的节点负荷,完成计算任务的分配,借由分配实现分布式计算节点50的负载均衡。First, the merchant 10 initiates a distributed computing task 30, namely, a target task, through the console 20. The console 20 is provided by a distributed computing system, and the technology for dividing the target task into multiple computing tasks 30 that can be executed in parallel can be implemented using the technology that has been disclosed in the art. Then, the server running the distributed computing load balancing method provided in this specification receives the multiple computing tasks that can be executed in parallel divided from the target task, establishes a connection with the database 40, and executes a database query statement. The estimated load 31 of the computing task is obtained based on the result data, and the current node load of the distributed computing node 50 is combined to complete the allocation of the computing task, and the load balancing of the distributed computing node 50 is achieved by allocation.
其中的数据库40是以一定方式储存、能与多个账户共享、具有尽可能小的冗余度、与应用程序彼此独立的数据集合。数据库管理系统(Database Management System,DBMS)是为管理数据库40而设计的电脑软件系统,一般具有存储、截取、安全保障、备份等基础功能。数据库管理系统可以依据它所支持的数据库模型来作分类,例如关系式、可扩展标记语言(Extensible Markup Language,XML);或依据所支持的计算机类型来作分类,例如服务器群集、移动电话;或依据所用查询语言来作分类,例如结构化查询语言(StructuredQuery Language,SQL)、XQuery;或依据性能冲量重点来作分类,例如最大规模、最高运行速度;亦或其他的分类方式。不论使用哪种分类方式,一些DBMS能够跨类别,例如,同时支持多种查询语言。The database 40 is a collection of data that is stored in a certain manner, can be shared with multiple accounts, has as little redundancy as possible, and is independent of the application program. A database management system (DBMS) is a computer software system designed for managing the database 40, and generally has basic functions such as storage, interception, security, and backup. Database management systems can be classified according to the database model it supports, such as relational, Extensible Markup Language (XML); or according to the type of computer supported, such as server clusters, mobile phones; or according to the query language used, such as Structured Query Language (SQL), XQuery; or according to performance impulse focus, such as maximum scale, maximum operating speed; or other classification methods. Regardless of the classification method used, some DBMS can cross categories, for example, supporting multiple query languages at the same time.
请参阅附图2,本说明书提供的一种分布式计算负载均衡方法,包括步骤:Please refer to FIG. 2 , a distributed computing load balancing method provided in this specification includes the following steps:
步骤S01)接收目标任务划分出的多个可并行执行的计算任务,读取所述计算任务的代码。Step S01) receiving a plurality of computing tasks that can be executed in parallel divided from a target task, and reading the codes of the computing tasks.
将目标任务划分成多个可并行执行的计算任务是分布式计算中的一个重要步骤,这个过程在本领域中被称为任务分解或任务划分。包括数据并行、任务并行划分等方式。采用数据并行时,可以选择均匀划分、哈希划分、范围划分、随机划分等方式划分数据。采用任务并行时,按任务的逻辑步骤或功能模块进行划分。Dividing the target task into multiple computing tasks that can be executed in parallel is an important step in distributed computing. This process is called task decomposition or task partitioning in this field. It includes data parallelism, task parallelism and other methods. When using data parallelism, you can choose to divide the data by uniform partitioning, hash partitioning, range partitioning, random partitioning and other methods. When using task parallelism, divide the task according to its logical steps or functional modules.
本实施例以采用数据并行为例,进行任务划分。示例性的,商家10通过控制台20创建了将昨日成交的待发货订单进行发货前的审核处理,以便将审核通过的待发货订单同步到仓库进行配货、发货。作为示例,审核处理包括地址检查(如包括地址是否完整、是否是不存在的地址、收货人联系方式是否形式上正确)、库存检查(检查订单中涉及的商品是否在仓库中存在库存,并在订审核处理通过时,锁定并扣除相应的库存)、订单状态检查(检查订单状态是否是待发货状态,而不是申请退款状态)。控制台20将昨日成交的待发货订单进行发货前的审核处理的目标任务按如下方式划分出多个可并行执行的计算任务。按照订单中包含产品种类、收货地址省份、订单金额进行划分,将目标任务划分出多个计算任务。This embodiment uses data parallelism as an example to divide tasks. Exemplarily, the merchant 10 creates a pre-delivery review process for the pending orders that were completed yesterday through the console 20, so that the pending orders that have passed the review can be synchronized to the warehouse for distribution and delivery. As an example, the review process includes address checking (such as whether the address is complete, whether it is a non-existent address, whether the consignee's contact information is formally correct), inventory checking (checking whether the goods involved in the order are in stock in the warehouse, and locking and deducting the corresponding inventory when the order review process passes), and order status checking (checking whether the order status is a pending delivery status, rather than a refund application status). The console 20 divides the target task of pre-delivery review processing for the pending orders that were completed yesterday into multiple computing tasks that can be executed in parallel as follows. The target task is divided into multiple computing tasks according to the product types, delivery address provinces, and order amounts contained in the order.
具体而言,以产品种类为以仅产品A、仅产品B、仅产品C、多个产品种类为例,收货地址省份以省份甲、省份乙、省份丙、其他省份为例,订单金额以金额区间(0,100]、(100,1000]、(1000,20000)为例。则可以获得4×4×3,即48个计算任务。示例性的,其中一个计算任务a处理多个产品种类的、收货地址在省份甲的、订单金额处于区间(100,1000]的待发货订单的发货前的审核处理。Specifically, take product types such as only product A, only product B, only product C, and multiple product types as examples, take province A, province B, province C, and other provinces as examples of the delivery address, and take the amount range (0, 100], (100, 1000], (1000, 20000) as examples of the order amount. Then, 4×4×3, or 48 computing tasks, can be obtained. For example, one computing task a processes the pre-shipment review processing of pending orders of multiple product types, whose delivery addresses are in province A, and whose order amounts are in the range (100, 1000].
计算任务b处理仅产品A的、收货地址为其他省份的、订单金额处于区间(0,100]的待发货订单的发货前的审核处理。计算任务c处理仅产品A的、收货地址为其他省份的、订单金额处于区间(100,1000]的待发货订单的发货前的审核处理。Computational task b processes the pre-shipment review of pending orders for product A only, whose delivery addresses are in other provinces, and whose order amounts are in the range of (0,100]. Computational task c processes the pre-shipment review of pending orders for product A only, whose delivery addresses are in other provinces, and whose order amounts are in the range of (100,1000].
计算任务d处理仅产品B的、收货地址为省份甲的、订单金额处于区间(0,100]的待发货订单的发货前的审核处理。计算任务e处理仅产品B的、收货地址为省份甲的、订单金额处于区间(100,1000]的待发货订单的发货前的审核处理。完成计算任务的划分,并读取所述计算任务的代码后,进入下一步骤。Computational task d processes the pre-shipment review of pending orders for product B, whose delivery address is province A, and whose order amount is in the interval (0,100]. Computational task e processes the pre-shipment review of pending orders for product B, whose delivery address is province A, and whose order amount is in the interval (100,1000]. After completing the division of the computational tasks and reading the code of the computational tasks, proceed to the next step.
步骤S02)识别所述代码中的数据库查询语句,执行所述数据库查询语句,获得结果数据。Step S02) identifying a database query statement in the code, executing the database query statement, and obtaining result data.
示例性的,计算任务b的代码主要部分如下:Exemplarily, the main part of the code for computing task b is as follows:
#与数据库建立连接#Establish a connection with the database
def connect_to_database(db_path: str) ->sqlite3.Connection:def connect_to_database(db_path: str) ->sqlite3.Connection:
conn = sqlite3.connect(db_path)conn = sqlite3.connect(db_path)
return connreturn conn
def process_orders(conn: sqlite3.Connection):def process_orders(conn: sqlite3.Connection):
cursor = conn.cursor()cursor = conn.cursor()
#查询符合条件的订单#Query the orders that meet the conditions
cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(产品A)'AND o.shipping_province != '省份A/B/C' AND o.order_status = '待发货' ANDo.order_amount BETWEEN 0 AND 100 """)cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(Product A)'AND o.shipping_province != 'Province A/B/C' AND o.order_status = 'To be shipped' ANDo.order_amount BETWEEN 0 AND 100 """)
orders = cursor.fetchall()orders = cursor.fetchall()
for order in orders:for order in orders:
order_id, customer_name, shipping_address, phone, order_amount,product_id, quantity = orderorder_id, customer_name, shipping_address, phone, order_amount, product_id, quantity = order
#检查地址完整性# Check address integrity
if not is_address_complete(shipping_address):if not is_address_complete(shipping_address):
print(f"Order {order_id}: Address is incomplete.")print(f"Order {order_id}: Address is incomplete.")
continuecontinue
#检查地址正确性# Check the address correctness
if not is_address_correct(shipping_address):if not is_address_correct(shipping_address):
print(f"Order {order_id}: Address is incorrect.")print(f"Order {order_id}: Address is incorrect.")
continuecontinue
#检查电话号码正确性# Check the phone number is correct
if not is_phone_number_valid(phone):if not is_phone_number_valid(phone):
print(f"Order {order_id}: Phone number is invalid.")print(f"Order {order_id}: Phone number is invalid.")
continuecontinue
#检查库存# Check Inventory
if not check_product_inventory(product_id, quantity, cursor):if not check_product_inventory(product_id, quantity, cursor):
print(f"Order {order_id}: Product {product_id} does not haveenough inventory.")print(f"Order {order_id}: Product {product_id} does not haveenough inventory.")
continuecontinue
#如果所有检查都通过,更新订单状态#If all checks pass, update the order status
cursor.execute("UPDATE orders SET order_status='已审核' WHERE id=?",(order_id,))cursor.execute("UPDATE orders SET order_status='Audited' WHERE id=?",(order_id,))
conn.commit()conn.commit()
if __name__ == "__main__":if __name__ == "__main__":
db_path = "path/to/your/database.db"db_path = "path/to/your/database.db"
conn = connect_to_database(db_path)conn = connect_to_database(db_path)
process_orders(conn)process_orders(conn)
conn.close()conn.close()
另外还需要设置专门的库,用于实现审核处理。作为示例,检查地址是否完整及是否正确的库函数如下:In addition, a special library needs to be set up to implement the audit process. As an example, the library function that checks whether the address is complete and correct is as follows:
# 检查地址是否完整# Check if the address is complete
def is_address_complete(address: str) ->bool:def is_address_complete(address: str) ->bool:
return all(part.strip() for part in address.split(','))return all(part.strip() for part in address.split(','))
# 检查地址是否正确# Check if the address is correct
def is_address_correct(address: str) ->bool:def is_address_correct(address: str) ->bool:
return truereturn true
其中,函数is_address_complete()以及is_address_correct()为密集计算语句,因这两条语句需要处理将地址与大量的地址审核规则进行比对的操作。具体的实现不属于本说明书提供方案的技术改进,在此不做论述,使用本领域已公开的技术进行即可。Among them, the functions is_address_complete() and is_address_correct() are intensive calculation statements, because these two statements need to process the operation of comparing the address with a large number of address audit rules. The specific implementation does not belong to the technical improvement of the solution provided in this specification, and will not be discussed here. It can be carried out using the technology disclosed in the field.
其他计算任务的代码类似计算任务b。在前述代码中,cursor.execute()语句和orders = cursor.fetchall()语句为数据库查询语句。在将计算任务b调度到分布式计算节点50上执行前,先行执行该数据库查询语句,获得结果数据。The codes of other computing tasks are similar to computing task B. In the above code, cursor.execute() statement and orders = cursor.fetchall() statement are database query statements. Before scheduling computing task B to be executed on distributed computing node 50, the database query statement is executed first to obtain result data.
步骤S03)根据所述结果数据获得所述计算任务的预估负荷31,所述预估负荷31包括预估耗时、预估内存空间占用和预估CPU资源占用。Step S03) Obtaining the estimated load 31 of the computing task according to the result data, wherein the estimated load 31 includes the estimated time consumption, the estimated memory space occupancy and the estimated CPU resource occupancy.
假设计算任务b的数据库查询语句执行后,获得的结果行数为12800条数据。假设每条订单记录的大小约为1KB、每条订单的处理时间为t=0.1秒(即100毫秒)、分布式计算节点50平均并发任务数为10。Assume that after the database query statement of computing task b is executed, the number of result rows obtained is 12,800 data. Assume that the size of each order record is about 1KB, the processing time of each order is t = 0.1 second (i.e. 100 milliseconds), and the average number of concurrent tasks of distributed computing nodes 50 is 10.
并发控制涉及任务调度、锁管理等,会有一定的耗时开销。假设并发控制的开销为每次处理的10%,即128s×10%=12.8s。Concurrency control involves task scheduling, lock management, etc., which will consume a certain amount of time. Assume that the cost of concurrency control is 10% of each processing, that is, 128s×10%=12.8s.
则计算任务b的预估耗时为T=12800×0.1÷10+12.8=140.8秒。The estimated time required to calculate task b is T=12800×0.1÷10+12.8=140.8 seconds.
结果数据有12800条记录,总大小为12800条×1KB/条=12800KB=12.5MB。在处理订单的过程中,可能需要创建一些中间数据结构,例如用于存储地址验证结果、电话号码验证结果等。这些数据结构的大小取决于具体实现细节,但通常不会很大。假设每个订单需要额外的100字节存储中间数据,则总共需要12800条×100字节/条=1280000字节=1.25MB。每次数据库连接和事务处理也会占用一定的内存资源。假设每次连接占用1MB内存,则10个并发连接共需要:10连接×1MB/连接=10MB。The result data has 12,800 records, with a total size of 12,800 × 1KB/record = 12,800KB = 12.5MB. In the process of processing orders, you may need to create some intermediate data structures, such as those used to store address verification results, phone number verification results, etc. The size of these data structures depends on the specific implementation details, but they are usually not very large. Assuming that each order requires an additional 100 bytes to store intermediate data, a total of 12,800 × 100 bytes/record = 1,280,000 bytes = 1.25MB is required. Each database connection and transaction processing will also occupy a certain amount of memory resources. Assuming that each connection occupies 1MB of memory, 10 concurrent connections require a total of: 10 connections × 1MB/connection = 10MB.
则计算任务b的预估内存空间占用为12.5MB(查询结果)+1.25MB(中间数据结构)+10MB(数据库连接)=23.75MB。The estimated memory space occupied by computing task b is 12.5MB (query results) + 1.25MB (intermediate data structure) + 10MB (database connection) = 23.75MB.
本说明书预先执行了数据库查询语句,并将结果数据直接给出,因此不需要在每条订单的处理中执行数据库查询。每个订单的处理逻辑包括地址验证、电话号码验证、库存检查等。假设每个订单处理需要10%的CPU资源。并发控制涉及任务调度、锁管理等,可能会有一定的CPU开销。假设并发控制的开销为每次处理的10%。则计算任务b的预估CPU资源占用为11%的CPU资源。This manual pre-executes the database query statement and directly gives the result data, so there is no need to execute the database query in the processing of each order. The processing logic of each order includes address verification, phone number verification, inventory check, etc. Assume that each order processing requires 10% of CPU resources. Concurrency control involves task scheduling, lock management, etc., which may have a certain CPU overhead. Assume that the overhead of concurrency control is 10% of each processing. Then the estimated CPU resource occupancy of calculation task b is 11% of CPU resources.
本实施例中,对计算任务的预估负荷31的预估是比较粗略的,更为准确的对预估负荷31进行预估的方案在后续的实施例中披露。In this embodiment, the estimation of the estimated load 31 of the computing task is relatively rough, and a more accurate scheme for estimating the estimated load 31 is disclosed in subsequent embodiments.
步骤S04)读取分布式计算节点50的节点负荷,所述节点负荷包括当前排队等待预估时长、内存空间占用及CPU资源占用。Step S04) Read the node load of the distributed computing node 50, wherein the node load includes the current estimated waiting time in the queue, memory space occupancy, and CPU resource occupancy.
获得分布式计算节点50的当前计算任务的排队列表,计算排队列表中计算任务的预估耗时的和。获得分布式计算节点50当前正在执行的计算任务的预估耗时,根据当前正在执行的计算任务被调度进CPU的时刻距离当前时刻的时间差,获得当前正在执行的计算任务已经被执行的时长。进而获得当前正在执行的计算任务还需要执行的预估时长,再加上排队列表中计算任务的预估耗时的和,即可获得分布式计算节点50的当前排队等待预估时长。内存空间占用及CPU资源占用能够由分布式计算节点50的资源管理器上直接获得。或者由分布式节点以预设的周期主动上报内存空间占用及CPU资源占用至指定的服务器。Obtain the queue list of the current computing tasks of the distributed computing node 50, and calculate the sum of the estimated time consumption of the computing tasks in the queue list. Obtain the estimated time consumption of the computing task currently being executed by the distributed computing node 50, and obtain the duration of the currently executed computing task based on the time difference between the time when the currently executed computing task is scheduled into the CPU and the current time. Then obtain the estimated duration that the currently executed computing task still needs to be executed, and add the sum of the estimated time consumption of the computing tasks in the queue list to obtain the current estimated waiting duration of the distributed computing node 50. The memory space occupancy and CPU resource occupancy can be directly obtained from the resource manager of the distributed computing node 50. Or the distributed node actively reports the memory space occupancy and CPU resource occupancy to the designated server at a preset period.
步骤S05)根据所述预估负荷31及所述节点负荷,获得每个所述计算任务适配的全部分布式计算节点50,以及所述计算任务的预估完成时刻。Step S05) According to the estimated load 31 and the node load, all the distributed computing nodes 50 adapted for each of the computing tasks and the estimated completion time of the computing tasks are obtained.
获得每个所述计算任务适配的全部分布式计算节点50,具体指,分布式计算节点50在当前排队等待预估时长之后,分布式计算节点50的空闲的内存空间大于计算任务b的预估内存空间占用。或者当前时刻就已经满足分布式计算节点50的空闲的内存空间大于计算任务b的预估内存空间占用。分布式计算节点50的CPU资源占用在当前排队等待预估时长之后,能够在预设的时长阈值内被释放。Obtaining all the distributed computing nodes 50 adapted for each of the computing tasks specifically refers to that after the distributed computing node 50 has waited in the current queue for the estimated time, the free memory space of the distributed computing node 50 is greater than the estimated memory space occupied by the computing task b. Or at the current moment, the free memory space of the distributed computing node 50 is greater than the estimated memory space occupied by the computing task b. The CPU resource occupancy of the distributed computing node 50 can be released within a preset time threshold after the current queue for the estimated time.
计算任务b的预估完成时刻等于当前时刻,加上当前排队等待预估时长,再加上预估耗时即可。The estimated completion time of task b is calculated to be equal to the current time, plus the current estimated waiting time in the queue, plus the estimated time taken.
步骤S06)根据预估完成时刻获得每个计算任务的指定分布式计算节点50,使分布式计算节点50完成被指定的全部计算任务的预估完成时刻的时间差最小。Step S06) Obtain the designated distributed computing node 50 for each computing task according to the estimated completion time, so that the time difference of the estimated completion time of the distributed computing node 50 to complete all designated computing tasks is minimized.
表1 计算任务的预估负荷Table 1 Estimated load of computing tasks
表2 分布式计算节点的节点负荷Table 2 Node load of distributed computing nodes
根据表1和表2可以看出分布式计算节点50的内存空间及CPU资源占用是足够用的,因此节点Node 1至3均适配每个计算任务。最终分配的结果为Node 1:Task b、Task f、Task i,Node 2:Task c、Task g,Node 3:Task d、Task e、Task h。节点Node 1完成被指定的全部计算任务的预估完成时刻为847.8 s,节点Node 2完成被指定的全部计算任务的预估完成时刻为762 s,节点Node 3完成被指定的全部计算任务的预估完成时刻为745 s。节点Node 3和节点Node 1之间的时间差为102.8 s,为使分布式计算节点50完成被指定的全部计算任务的预估完成时刻的时间差最小的情况。According to Table 1 and Table 2, it can be seen that the memory space and CPU resource occupancy of the distributed computing node 50 are sufficient, so nodes 1 to 3 are adapted to each computing task. The final allocation result is Node 1: Task b, Task f, Task i, Node 2: Task c, Task g, Node 3: Task d, Task e, Task h. The estimated completion time for node 1 to complete all the designated computing tasks is 847.8 s, the estimated completion time for node 2 to complete all the designated computing tasks is 762 s, and the estimated completion time for node 3 to complete all the designated computing tasks is 745 s. The time difference between node 3 and node 1 is 102.8 s, which is the case where the time difference of the estimated completion time of the distributed computing node 50 to complete all the designated computing tasks is minimized.
本实施例能够取得的新的技术效果包括:将计算任务中的数据库查询语句在指定的服务器上前置执行,从而能够根据结果数据的条数,相对更为准确的预估计算任务的预估负荷31,进而帮助使分布式计算负载更为均衡的进行调度。通过读取分布式计算节点50的节点负荷,结合计算任务的预估负荷31,使分布式计算节点50完成被指定的全部计算任务的预估完成时刻的时间差最小,从而能够使计算任务能够更快的进行融合,获得目标任务的结果,提高分布式计算的效率。在指定的服务器上将数据库查询语句前置执行,从而避免了分布式节点频繁的与服务器建立连接的情况,减轻了数据库压力,有助于提高数据库的执行效率。The new technical effects that can be achieved by this embodiment include: pre-executing the database query statement in the computing task on the designated server, so that the estimated load 31 of the computing task can be estimated more accurately based on the number of result data, thereby helping to schedule the distributed computing load more evenly. By reading the node load of the distributed computing node 50 and combining the estimated load 31 of the computing task, the time difference of the estimated completion time of the distributed computing node 50 to complete all the designated computing tasks is minimized, so that the computing tasks can be integrated faster, the results of the target tasks can be obtained, and the efficiency of distributed computing can be improved. Pre-executing the database query statement on the designated server avoids the situation where the distributed node frequently establishes a connection with the server, reduces the pressure on the database, and helps to improve the execution efficiency of the database.
实施例2Example 2
请参阅附图3,在本实施例中,识别所述代码中的数据库查询语句,执行所述数据库查询语句,获得结果数据的方法包括步骤:Please refer to FIG. 3. In this embodiment, the method of identifying the database query statement in the code, executing the database query statement, and obtaining result data includes the following steps:
步骤S201)一次性读取多个所述计算任务的代码,提取代码中的全部数据库查询语句。Step S201) Read the codes of multiple computing tasks at one time and extract all database query statements in the codes.
示例性的,一次性读取计算任务b和计算任务c的代码,提取获得计算任务b和计算任务c的数据库查询语句为:Exemplarily, the codes of computing task b and computing task c are read at one time, and the database query statements for computing task b and computing task c are extracted as follows:
cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(产品A)'AND o.shipping_province != '省份A/B/C' AND o.order_status = '待发货' ANDo.order_amount BETWEEN 0 AND 100 """)cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(Product A)'AND o.shipping_province != 'Province A/B/C' AND o.order_status = 'To be shipped' ANDo.order_amount BETWEEN 0 AND 100 """)
orders = cursor.fetchall()orders = cursor.fetchall()
和and
cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(产品A)'AND o.shipping_province != '省份A/B/C' AND o.order_status = '待发货' ANDo.order_amount BETWEEN 100 AND 1000 """)cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(Product A)'AND o.shipping_province != 'Province A/B/C' AND o.order_status = 'To be shipped' ANDo.order_amount BETWEEN 100 AND 1000 """)
orders = cursor.fetchall()orders = cursor.fetchall()
提取代码中的全部数据库查询语句后,进入下一步骤。After extracting all database query statements in the code, proceed to the next step.
步骤S202)识别针对同一个数据表的数据库查询语句并归为一组,将同组的数据库查询语句的查询条件进行融合,从而将同组的数据库查询语句融合,称为融合查询语句。Step S202) Database query statements for the same data table are identified and grouped together, and query conditions of the database query statements in the same group are merged, so that the database query statements in the same group are merged, which is called a merged query statement.
计算任务b和计算任务c的数据库查询语句是针对同一个数据表的,归为一组。将计算任务b和计算任务c的数据库查询语句的查询条件进行融合,从而将两条数据库查询语句合并,即被称为融合查询语句。在一次查询中即可执行完成,减少了一次数据库查询的执行。虽然结果数据的总的条数不变,但融合查询语句只需要数据库进行一次数据筛选,而原本的两个数据库查询语句需要数据库进行两次数据筛选,显然能够提高数据库的查询效率,降低数据库的工作量。The database query statements of computing tasks b and c are for the same data table and are grouped together. The query conditions of the database query statements of computing tasks b and c are merged to merge the two database query statements, which is called a fused query statement. It can be executed in one query, reducing the execution of one database query. Although the total number of result data remains unchanged, the fused query statement only requires the database to perform data screening once, while the original two database query statements require the database to perform data screening twice. Obviously, it can improve the query efficiency of the database and reduce the workload of the database.
示例性的,融合查询语句为:Exemplarily, the fusion query statement is:
cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(产品A)'AND o.shipping_province != '省份A/B/C' AND o.order_status = '待发货' ANDo.order_amount BETWEEN 0 AND 1000 """)cursor.execute("""SELECT o.id, o.customer_name, o.shipping_address,o.phone, o.order_amount, oi.product_id, oi.quantity FROM orders o JOIN order_items oi ON o.id = oi.order_id WHERE o.product_id = 'get_product_id(Product A)'AND o.shipping_province != 'Province A/B/C' AND o.order_status = 'To be shipped' ANDo.order_amount BETWEEN 0 AND 1000 """)
orders = cursor.fetchall()orders = cursor.fetchall()
即将数值范围(0,100]和(100,1000]进行了融合,融合为(0,1000]。其中,WHERE语句整体构成本说明书所指的查询条件,其中o.order_amount BETWEEN 0 AND 1000 构成一个列的条件。完成计算任务的划分,并读取所述计算任务的代码后,进入下一步骤。That is, the numerical ranges (0, 100] and (100, 1000] are merged into (0, 1000]. The WHERE statement as a whole constitutes the query condition referred to in this specification, and o.order_amount BETWEEN 0 AND 1000 constitutes a column condition. After completing the division of the computing tasks and reading the code of the computing tasks, proceed to the next step.
步骤S203)连接数据库,执行所述融合查询语句,获得融合查询结果数据。Step S203) Connect to the database, execute the fusion query statement, and obtain fusion query result data.
执行融合查询语句,将获得仅产品A的、收货地址为其他省份的、订单金额处于区间(0,1000]的全部待发货订单,作为融合查询结果数据。在计算任务b和计算任务c的程序中添加将融合查询结果数据赋值给一个数组变量,而后在遍历数组变量的每个元素的循环体中,添加一个筛选判断语句。Execute the fusion query statement to obtain all pending orders for product A with delivery addresses in other provinces and order amounts in the range (0,1000] as the fusion query result data. Add the assignment of the fusion query result data to an array variable in the programs of calculation tasks b and c, and then add a filter judgment statement in the loop body that traverses each element of the array variable.
对于计算任务b而言,若订单金额不处于(0,100],则跳过本次循环体的执行。即:For calculation task b, if the order amount is not in (0,100], the execution of this loop body is skipped. That is:
if order_amount >= 100:if order_amount >= 100:
continuecontinue
else:else:
#后续检查语句#Subsequent check statements
使用continue语句能够直接跳过本条订单数据的处理,转而处理下一条订单数据,对于计算任务b而言,若订单金额不处于(100,1000],则跳过本次循环体的执行。从而实现两个计算任务,只需要进行一次数据库查询即可。对于计算任务的效率的降低远不及对节省一次数据库查询对效率的提高,因此整体上能够提高分布式计算的效率。The continue statement can be used to directly skip the processing of this order data and process the next order data. For computing task b, if the order amount is not in (100,1000], the execution of this loop body is skipped. Thus, only one database query is required to achieve two computing tasks. The reduction in the efficiency of the computing task is far less than the improvement in efficiency by saving one database query, so the efficiency of distributed computing can be improved as a whole.
请参阅附图4,将同组的数据库查询语句的查询条件进行融合的方法包括步骤:Please refer to FIG. 4 , the method for fusing query conditions of database query statements in the same group includes the following steps:
步骤S301)当查询条件中有多个列的条件不同时,将多个数据库查询语句在一次数据库连接中执行,当查询条件中仅一个列的条件不同时,将查询条件中对同一个列的数值范围条件及数值取值条件进行并集操作,从而融合数值范围条件及数值取值条件获得融合数值条件。Step S301) When the conditions of multiple columns in the query conditions are different, multiple database query statements are executed in one database connection. When the condition of only one column in the query conditions is different, the numerical range condition and the numerical value condition of the same column in the query conditions are combined to obtain a combined numerical condition by fusing the numerical range condition and the numerical value condition.
示例性的,当数据库查询语句中,仅o.order_amount的条件不同时,数值范围条件的融合为将o.order_amount BETWEEN 0 AND 100以及o.order_amount BETWEEN 100 AND1000,融合为o.order_amount BETWEEN 0 AND 1000。在结果数据引用代码之后,添加上相应的筛选语句即可。For example, when only the condition of o.order_amount is different in the database query statement, the numerical range condition is merged by merging o.order_amount BETWEEN 0 AND 100 and o.order_amount BETWEEN 100 AND1000 into o.order_amount BETWEEN 0 AND 1000. After the result data reference code, add the corresponding filter statement.
计算任务b和计算任务d的数据库查询语句即属于当查询条件中有多个列的条件不同时的情况,此时将多个数据库查询语句在一次数据库连接中执行即可,能够减少数据库连接的数量,一定程度上降低数据库压力。The database query statements of computing tasks b and d belong to the situation when the conditions of multiple columns in the query conditions are different. At this time, multiple database query statements can be executed in one database connection, which can reduce the number of database connections and reduce the database pressure to a certain extent.
又一示例性的,当数据库查询语句中,仅o.product_id 的条件不同时,数值取值条件的融合为将o.product_id=’3012’、与o.product_id=’4504’融合为o.product_id=’3012’ OR o.product_id=’4504’。As another example, when only the condition of o.product_id is different in the database query statement, the fusion of the numerical value conditions is to merge o.product_id=’3012’ and o.product_id=’4504’ into o.product_id=’3012’ OR o.product_id=’4504’.
步骤S302)将查询条件中对同一个列的标签取值条件进行并集操作,获得融合标签条件。Step S302) Perform a union operation on the label value conditions for the same column in the query conditions to obtain a fused label condition.
又一示例性的,当数据库查询语句中,仅o.shipping_province的条件不同时,将标签取值条件o.shipping_province != '省份A/B/C' 和o.shipping_province = '省份A'融合为o.shipping_province != '省份B/C'。As another example, when only the condition of o.shipping_province is different in the database query statement, the label value conditions o.shipping_province != 'province A/B/C' and o.shipping_province = 'province A' are merged into o.shipping_province != 'province B/C'.
步骤S303)获得融合数值条件对应的列的可能取值范围,当所述融合数据条件对应的数值范围与所述可能取值范围的占比超过预设占比阈值时,删除所述融合数值条件。Step S303) Obtain the possible value range of the column corresponding to the fused numerical condition, and when the ratio of the numerical range corresponding to the fused numerical condition to the possible value range exceeds a preset ratio threshold, delete the fused numerical condition.
示例性的,当o.order_amount BETWEEN 0 AND 100以及o.order_amount BETWEEN100 AND 1000,融合为o.order_amount BETWEEN 0 AND 1000时。融合后的数值范围为(0,1000],与可能的取值范围(0,20000]的占比未超过预设占比阈值,不做处理。For example, when o.order_amount BETWEEN 0 AND 100 and o.order_amount BETWEEN100 AND 1000 are merged into o.order_amount BETWEEN 0 AND 1000, the merged value range is (0,1000], and the proportion of the possible value range (0,20000] does not exceed the preset proportion threshold, and no processing is performed.
又一示例性的,当o.order_amount BETWEEN 100 AND 1000以及o.order_amount>1000 ,融合为o.order_amount>100。融合后的数值范围为(100,20000]与可能的取值范围(0,20000]的占比超过预设占比阈值。此时可以不再对o.order_amount进行筛选。即直接删除数据库查询语句中对列o.order_amount的条件设定。In another example, when o.order_amount BETWEEN 100 AND 1000 and o.order_amount>1000, they are merged into o.order_amount>100. The ratio of the merged value range (100,20000] to the possible value range (0,20000] exceeds the preset ratio threshold. At this time, o.order_amount can no longer be filtered. That is, the condition setting for the column o.order_amount in the database query statement is directly deleted.
另一方面,获得融合标签条件对应的列的可能取值标签集合,当所述融合标签条件覆盖所述可能取值标签集合时,删除所述融合标签条件。示例性的,o.shipping_province = '省份A'、o.shipping_province = '省份B'、o.shipping_province = '省份C'以及o.shipping_province != '省份A/B/C',由于覆盖了o.shipping_province的全部可能取值,因此融合后的融合查询语句也不需要对o.shipping_province进行筛选,即直接删除对列o.shipping_province的条件设定。On the other hand, obtain the possible value label set of the column corresponding to the fusion label condition, and when the fusion label condition covers the possible value label set, delete the fusion label condition. For example, o.shipping_province = 'Province A', o.shipping_province = 'Province B', o.shipping_province = 'Province C' and o.shipping_province != 'Province A/B/C', since all possible values of o.shipping_province are covered, the fused query statement does not need to filter o.shipping_province, that is, directly delete the condition setting for column o.shipping_province.
另一方面,删除所述融合数值条件前,请参阅附图5,执行以下步骤:On the other hand, before deleting the fusion numerical condition, please refer to FIG. 5 and perform the following steps:
步骤S401)读取预设排查表,所述排查表记录列的概然分布概率。Step S401) Read a preset checklist, which records the probabilities of the probable distribution of columns.
步骤S402)根据所述排查表获得所述融合数值条件对应的列的概然分布概率。Step S402) Obtain the probable distribution probability of the column corresponding to the fusion numerical condition according to the search table.
步骤S403)当所述融合数据条件对应的数值范围对应的概然分布概率低于预设阈值时,阻止删除所述融合数值条件,当所述融合数据条件对应的数值范围对应的概然分布概率不低于预设阈值时,执行删除所述融合数值条件。Step S403) When the probability of the probable distribution corresponding to the numerical range corresponding to the fused data condition is lower than a preset threshold, the fused numerical condition is prevented from being deleted; when the probability of the probable distribution corresponding to the numerical range corresponding to the fused data condition is not lower than the preset threshold, the fused numerical condition is deleted.
示例性的,排查表记录了列o.shipping_province的分布为均匀分布。在本次分布式计算任务前,即根据产品A的历史销售订单数据中,获得了消费者仅购买产品A时的订单金额o.order_amount 的分布概率,从而能够在排查表中记录列o.order_amount 的概然分布概率,假设为{'<=100':45%,'>100 AND<=1000':40%,'>1000':15%}。假设预设阈值为80%,当融合数据条件为o.order_amount>100时,由于融合数据条件对应的数值范围对应的概然分布概率为55%,低于预设阈值80%,则此时不执行删除所述融合数值条件的操作。Exemplarily, the checklist records that the distribution of column o.shipping_province is uniformly distributed. Before this distributed computing task, based on the historical sales order data of product A, the distribution probability of the order amount o.order_amount when the consumer only purchases product A is obtained, so that the probable distribution probability of column o.order_amount can be recorded in the checklist, assuming it is {'<=100':45%,'>100 AND<=1000':40%,'>1000':15%}. Assuming the preset threshold is 80%, when the fused data condition is o.order_amount>100, since the probable distribution probability corresponding to the numerical range corresponding to the fused data condition is 55%, which is lower than the preset threshold of 80%, the operation of deleting the fused numerical condition is not performed at this time.
另一方面,预设排查表的方法包括步骤:On the other hand, the method for presetting the checklist includes the steps of:
预设若干个分布概率函数;Preset several distribution probability functions;
根据所述列的属性从若干个所述分布概率函数中选择一个分布概率函数作为列的分布概率函数;Selecting a distribution probability function from a plurality of the distribution probability functions as the distribution probability function of the column according to the attribute of the column;
根据所述分布概率函数获得列的概然分布概率。The probable distribution probability of the column is obtained according to the distribution probability function.
读取电子商务已经产生的订单数据,并进行计算获得概然分布概率,仍然需要消耗算力。因此,可以通过人工或者大模型,直接根据经验或者由大模型学习到的知识,从若干个所述分布概率函数中选择一个分布概率函数作为列的分布概率函数。进行大概的估计设置即可。比如,订单金额o.order_amount可以直接设置为符合正泰分布,进一步计算每个金额区间段即可获得概然分布概率。其余同实施例1。Reading the order data that has been generated by e-commerce and calculating the probable distribution probability still requires computing power. Therefore, you can manually or through a large model, directly based on experience or knowledge learned from the large model, select a distribution probability function from several distribution probability functions as the distribution probability function of the column. Just make a rough estimate. For example, the order amount o.order_amount can be directly set to conform to the Zhengtai distribution, and further calculate each amount interval segment to obtain the probable distribution probability. The rest is the same as Example 1.
本实施例相对于实施例1新取得的技术效果包括:Compared with Embodiment 1, the technical effects newly achieved by this embodiment include:
通过融合查询语句进一步减少数据库连接以及数据库查询次数,减轻数据库压力,减少数据处理任务中等待数据库响应的时长,提高数据处理任务的执行效率。对于符合相关条件的融合数据条件进行删除操作,能够提高数据库的执行效率。By fusing query statements, we can further reduce database connections and the number of database queries, relieve database pressure, reduce the time it takes to wait for a database response in data processing tasks, and improve the execution efficiency of data processing tasks. Deleting fused data that meets the relevant conditions can improve the execution efficiency of the database.
实施例3Example 3
在本实施例中,请参阅附图6,根据所述结果数据获得所述计算任务的预估耗时的方法包括步骤:In this embodiment, referring to FIG. 6 , the method for obtaining the estimated time consumption of the computing task according to the result data includes the following steps:
步骤S501)读取预设的语句预估耗时表,根据所述语句预估耗时表获得所述计算任务的代码的每条语句的预估耗时;Step S501) reading a preset statement estimated time consumption table, and obtaining the estimated time consumption of each statement of the code of the computing task according to the statement estimated time consumption table;
步骤S502)根据所述融合查询结果数据的行数获得所述计算任务的代码的每条语句的执行次数;Step S502) obtaining the execution times of each statement of the code of the computing task according to the number of rows of the fusion query result data;
步骤S503)根据每条语句的所述预估耗时及执行次数,获得所述结果数据获得所述计算任务的预估耗时。Step S503) According to the estimated time consumption and execution times of each statement, the result data is obtained to obtain the estimated time consumption of the computing task.
示例性的,数据处理任务b的代码中包括sqlite3.connect语句、conn.cursor语句、SELECT语句、for循环体语句、is_address_complete语句、is_address_correct语句、is_phone_number_valid语句、check_product_inventory语句。在语句预估耗时表中记录了sqlite3.connect语句、conn.cursor语句、SELECT语句由于需要数据库的响应,其预估耗时被记录为100ms,for循环体语句为循环体中语句的数量与100ms的乘积,而后再乘以循环次数。循环体中语句的数量为6条,即600ms与循环次数的乘积。For example, the code of data processing task b includes sqlite3.connect statement, conn.cursor statement, SELECT statement, for loop body statement, is_address_complete statement, is_address_correct statement, is_phone_number_valid statement, and check_product_inventory statement. In the statement estimated time table, the estimated time of sqlite3.connect statement, conn.cursor statement, and SELECT statement is recorded as 100ms because the response of the database is required. The for loop body statement is the product of the number of statements in the loop body and 100ms, and then multiplied by the number of loops. The number of statements in the loop body is 6, that is, the product of 600ms and the number of loops.
请参阅附图7,根据所述结果数据获得所述计算任务的预估内存空间占用的方法包括步骤:Please refer to FIG. 7 , the method for obtaining the estimated memory space occupied by the computing task according to the result data includes the following steps:
步骤S601)将所述计算任务的代码占用的存储空间及所述结果数据占用的存储空间求和,获得总占用存储空间;Step S601) summing the storage space occupied by the code of the computing task and the storage space occupied by the result data to obtain the total occupied storage space;
步骤S602)将所述总占用存储空间与预设的系数相乘;Step S602) multiplying the total occupied storage space by a preset coefficient;
步骤S603)当乘积不超过预设的上限阈值时,所述乘积作为所述预估内存空间占用,当乘积超过预设的上限阈值时,所述上限阈值作为所述预估内存空间占用。Step S603) When the product does not exceed a preset upper threshold, the product is used as the estimated memory space occupancy; when the product exceeds the preset upper threshold, the upper threshold is used as the estimated memory space occupancy.
代码占用的存储空间是指执行计算任务所需代码的大小,结果数据占用的存储空间是指计算任务执行过程中从数据库查询获得的结果数据的、生成的中间数据和最终结果数据的大小的总和。预设的系数是一个经验系数,用于调整总占用存储空间,以反映实际内存使用情况。示例性的,系数可以设置为1.2。The storage space occupied by the code refers to the size of the code required to perform the computing task, and the storage space occupied by the result data refers to the sum of the size of the result data obtained from the database query during the execution of the computing task, the generated intermediate data, and the final result data. The preset coefficient is an empirical coefficient used to adjust the total occupied storage space to reflect the actual memory usage. Exemplarily, the coefficient can be set to 1.2.
预设的上限阈值指分布式计算节点50允许的最大内存占用量,一般来讲即为分布式计算节点50的内存的容量。当超过这个上限阈值时,预估内存空间占用将取上限阈值。此时分布式计算节点50将采用虚拟内存的方式,将部分硬盘空间作为内存空间使用。The preset upper threshold refers to the maximum memory usage allowed by the distributed computing node 50, which is generally the memory capacity of the distributed computing node 50. When this upper threshold is exceeded, the estimated memory space usage will take the upper threshold. At this time, the distributed computing node 50 will use virtual memory to use part of the hard disk space as memory space.
请参阅附图8,根据所述结果数据获得所述计算任务的预估CPU资源占用的方法包括步骤:Please refer to FIG8 , the method for obtaining the estimated CPU resource occupancy of the computing task according to the result data includes the following steps:
步骤S701)读取预设的密集计算语句表,所述密集计算语句表记录密集计算语句及对应的预估语句单次CPU资源占用,将所述计算任务的代码与所述密集计算语句表比对;Step S701) reading a preset intensive computing statement table, the intensive computing statement table recording the single CPU resource occupancy of intensive computing statements and corresponding estimated statements, and comparing the code of the computing task with the intensive computing statement table;
步骤S702)当所述计算任务的代码不存在所述密集计算语句表记录的语句时,使用预设的CPU资源占用作为所述计算任务的预估CPU资源占用;Step S702) when the code of the computing task does not contain the statement recorded in the intensive computing statement table, using the preset CPU resource occupancy as the estimated CPU resource occupancy of the computing task;
步骤S703)当所述计算任务的代码存在所述密集计算语句表记录的密集计算语句时,根据所述结果数据获得所述密集计算语句的执行次数,根据所述执行次数、并行配置及所述预估语句单次CPU资源占用获得所述计算任务的预估CPU资源占用。Step S703) When the code of the computing task contains an intensive computing statement recorded in the intensive computing statement table, the number of executions of the intensive computing statement is obtained according to the result data, and the estimated CPU resource occupancy of the computing task is obtained according to the number of executions, the parallel configuration and the estimated single CPU resource occupancy of the statement.
在计算任务b中,is_address_complete语句、is_address_correct语句、is_phone_number_valid语句、check_product_inventory语句为密集计算语句。因这些语句要执行较多的运算。在密集计算语句表中,记录了这些语句的预估语句单次CPU资源占用。预估语句单次CPU资源占用由多次历史执行时,监测获得的CPU资源占用求均值获得。历史执行时仅执行密集计算语句,获得通过设置对照执行,即一次执行中即执行密集计算语句还执行有其他语句,另一次执行仅执行了相同的其他语句,两次对比即可获得密集计算语句的CPU资源占用。由于CPU执行普通语句时,占用的CPU资源差异的影响不大,因此直接使用预设的CPU资源占用即可。In computing task b, is_address_complete, is_address_correct, is_phone_number_valid, and check_product_inventory are intensive computing statements. These statements need to perform more operations. In the intensive computing statement table, the estimated single CPU resource usage of these statements is recorded. The estimated single CPU resource usage of the statement is obtained by averaging the CPU resource usage obtained by monitoring during multiple historical executions. During historical executions, only intensive computing statements are executed, and the CPU resource usage of intensive computing statements is obtained by setting a comparison execution, that is, in one execution, intensive computing statements are executed together with other statements, and in another execution, only the same other statements are executed. The CPU resource usage of intensive computing statements can be obtained by comparing them twice. Since the difference in CPU resources occupied when the CPU executes ordinary statements has little impact, the preset CPU resource usage can be used directly.
当存在密集计算语句时,根据结果数据获得密集计算语句的执行次数,根据执行次数、并行配置及预估语句单次CPU资源占用获得所述计算任务的预估CPU资源占用。当并行配置为不并行时,预估CPU资源占用即等于预估语句单次CPU资源占用。当并行配置为根据执行次数,自动生产并行数量时,预估CPU资源占用等于预估语句单次CPU资源占用与并行数量的乘积。示例性的,当执行次数为100次时,并行数量为3,预估语句单次CPU资源占用为5%,则预估CPU资源占用等于15%(即3×5%)。又一示例性的,当执行次数为10000次时,并行数量为10,预估语句单次CPU资源占用为5%,则预估CPU资源占用等于50%(即10×5%)。并行配置决定了分布式计算节点50面临多次执行时,配置几个并行线程进行处理。其余同实施例1。When there is an intensive computing statement, the number of executions of the intensive computing statement is obtained according to the result data, and the estimated CPU resource occupancy of the computing task is obtained according to the number of executions, the parallel configuration and the estimated statement single CPU resource occupancy. When the parallel configuration is non-parallel, the estimated CPU resource occupancy is equal to the estimated statement single CPU resource occupancy. When the parallel configuration is to automatically generate the number of parallels according to the number of executions, the estimated CPU resource occupancy is equal to the product of the estimated statement single CPU resource occupancy and the number of parallels. Exemplarily, when the number of executions is 100 times, the number of parallels is 3, and the estimated statement single CPU resource occupancy is 5%, then the estimated CPU resource occupancy is equal to 15% (ie 3×5%). Another exemplary example, when the number of executions is 10,000 times, the number of parallels is 10, and the estimated statement single CPU resource occupancy is 5%, then the estimated CPU resource occupancy is equal to 50% (ie 10×5%). The parallel configuration determines that when the distributed computing node 50 faces multiple executions, several parallel threads are configured for processing. The rest is the same as Example 1.
本实施例相对于实施例1新取得的技术效果包括:Compared with Embodiment 1, the technical effects newly achieved by this embodiment include:
通过改进的预估耗时、预估内存空间占用及预估CPU资源占用,能够相对更加准确的对耗时、内存空间占用以及CPU资源占用进行预估,使计算任务的调度更为适宜,有助于提高分布式计算负载的均衡,帮助提高分布式计算的效率。By improving the estimated time consumption, estimated memory space occupancy and estimated CPU resource occupancy, it is possible to estimate the time consumption, memory space occupancy and CPU resource occupancy relatively more accurately, making the scheduling of computing tasks more appropriate, helping to improve the balance of distributed computing loads, and helping to improve the efficiency of distributed computing.
另一方面,本说明书提供了一种分布式计算负载均衡系统,请参阅附图9,包括:On the other hand, this specification provides a distributed computing load balancing system, please refer to Figure 9, including:
接收模块100,接收目标任务划分出的多个可并行执行的计算任务,读取所述计算任务的代码;A receiving module 100 receives a plurality of computing tasks that can be executed in parallel divided from a target task, and reads the codes of the computing tasks;
查询模块200,识别所述代码中的数据库查询语句,执行所述数据库查询语句,获得结果数据;A query module 200, which identifies a database query statement in the code, executes the database query statement, and obtains result data;
预估模块300,根据所述结果数据获得所述计算任务的预估负荷31,所述预估负荷31包括预估耗时、预估内存空间占用和预估CPU资源占用;An estimation module 300 obtains an estimated load 31 of the computing task according to the result data, wherein the estimated load 31 includes an estimated time consumption, an estimated memory space occupancy, and an estimated CPU resource occupancy;
读取模块400,读取分布式计算节点50的节点负荷,所述节点负荷包括当前排队等待预估时长、内存空间占用及CPU资源占用;The reading module 400 reads the node load of the distributed computing node 50, wherein the node load includes the current estimated waiting time in the queue, the memory space occupied, and the CPU resource occupied;
适配模块500,根据所述预估负荷31及所述节点负荷,获得每个所述计算任务适配的全部分布式计算节点50,以及所述计算任务的预估完成时刻;The adaptation module 500 obtains all the distributed computing nodes 50 adapted for each of the computing tasks and the estimated completion time of the computing tasks according to the estimated load 31 and the node load;
指定模块600,根据预估完成时刻获得每个计算任务的指定分布式计算节点50,使所述目标任务对应的全部计算任务的预估完成时刻的时间差最小。The designation module 600 obtains the designated distributed computing node 50 for each computing task according to the estimated completion time, so as to minimize the time difference of the estimated completion times of all computing tasks corresponding to the target task.
请参阅图10示出的本说明书实施例提供的一种电子设备的结构示意图。Please refer to FIG. 10 , which is a schematic diagram of the structure of an electronic device provided in an embodiment of this specification.
如图10所示,该电子设备 1100可以包括:至少一个处理器 1101、至少一个网络接口 1104、用户接口 1103、存储器 1105以及至少一个通信总线 1102。其中,通信总线 1102可用于实现上述各个组件的连接通信。其中,用户接口 1103可以包括按键,可选用户接口还可以包括标准的有线接口、无线接口。其中,网络接口 1104 可以但不局限于包括蓝牙模块、NFC 模块、Wi-Fi 模块等。其中,处理器 1101 可以包括一个或者多个处理核心。处理器1101 利用各种接口和线路连接整个电子设备 1100内的各个部分,通过运行或执行存储在存储器 1105 内的指令、程序、代码集或指令集,以及调用存储在存储器 1105 内的数据,执行路由设备 1100 的各种功能和处理数据。可选的,处理器 1101 可以采用 DSP、FPGA、PLA 中的至少一种硬件形式来实现。处理器 1101 可集成CPU、GPU 和调制解调器等中的一种或几种的组合。其中,CPU 主要处理操作系统、用户界面和应用程序等;GPU 用于负责显示屏所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。As shown in FIG10 , the electronic device 1100 may include: at least one processor 1101, at least one network interface 1104, a user interface 1103, a memory 1105, and at least one communication bus 1102. The communication bus 1102 may be used to realize the connection and communication of the above-mentioned components. The user interface 1103 may include a button, and the optional user interface may also include a standard wired interface and a wireless interface. The network interface 1104 may include, but is not limited to, a Bluetooth module, an NFC module, a Wi-Fi module, etc. The processor 1101 may include one or more processing cores. The processor 1101 uses various interfaces and lines to connect various parts of the entire electronic device 1100, and executes various functions and processes data of the routing device 1100 by running or executing instructions, programs, code sets or instruction sets stored in the memory 1105, and calling data stored in the memory 1105. Optionally, the processor 1101 may be implemented in at least one hardware form of DSP, FPGA, and PLA. The processor 1101 may integrate one or a combination of a CPU, a GPU, and a modem, etc. Among them, the CPU mainly processes the operating system, the user interface, and the application program, etc.; the GPU is responsible for rendering and drawing the content to be displayed on the display screen; and the modem is used to process wireless communications.
可以理解的是,上述调制解调器也可以不集成到处理器 1101中,单独通过一块芯片进行实现。It is understandable that the above-mentioned modem may not be integrated into the processor 1101, but may be implemented by a separate chip.
其中,存储器 1105 可以包括 RAM,也可以包括 ROM。可选的,该存储器 1105 包括非瞬时性计算机可读介质。存储器 1105 可用于存储指令、程序、代码、代码集或指令集。存储器 1105可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现上述各个方法实施例的指令等;存储数据区可存储上面各个方法实施例中涉及到的数据等。存储器1105 可选的还可以是至少一个位于远离前述处理器 1101 的存储装置。作为一种计算机存储介质的存储器 1105中可以包括操作系统、网络通信模块、用户接口模块以及应用程序。处理器 1101 可以用于调用存储器 1105 中存储的应用程序,并执行上述多个实施例中的方法。Among them, the memory 1105 may include RAM or ROM. Optionally, the memory 1105 includes a non-transitory computer-readable medium. The memory 1105 can be used to store instructions, programs, codes, code sets or instruction sets. The memory 1105 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), instructions for implementing the above-mentioned various method embodiments, etc.; the data storage area may store data involved in the above-mentioned various method embodiments, etc. The memory 1105 may also be at least one storage device located away from the aforementioned processor 1101. The memory 1105 as a computer storage medium may include an operating system, a network communication module, a user interface module and an application program. The processor 1101 may be used to call the application program stored in the memory 1105 and execute the methods in the above-mentioned multiple embodiments.
本说明书实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述实施例中的多个步骤。上述电子设备的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中。The embodiments of this specification also provide a computer-readable storage medium, which stores instructions, and when the instructions are executed on a computer or a processor, the computer or the processor executes the multiple steps in the above embodiments. If the components of the above electronic device are implemented in the form of software functional units and sold or used as independent products, they can be stored in the computer-readable storage medium.
本说明书实施例还提供了一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现上述实施例中的多个步骤。The embodiments of this specification also provide a computer program product, including a computer program, which implements multiple steps in the above embodiments when executed by a processor.
在不冲突的情况下,本实施例和实施方案中的技术特征可以任意组合。In the absence of conflict, the technical features in this embodiment and implementation scheme can be combined arbitrarily.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本说明书实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital SubscriberLine,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字多功能光盘(Digital Versatile Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes a plurality of computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of this specification is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium. The computer instructions may be transmitted from a website site, a computer, a server or a data center to another website site, a computer, a server or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or a data center that includes multiple available media integrated. The available medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a digital versatile disc (DVD)), or a semiconductor medium (eg, a solid state drive (SSD)).
当通过硬件、固件实现时,将前述的方法流程编程到硬件电路中来得到相应的硬件电路结构,实现相应的功能。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(FieldProgrammable GateArray,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。When implemented by hardware or firmware, the aforementioned method flow is programmed into the hardware circuit to obtain the corresponding hardware circuit structure and realize the corresponding function. For example, a programmable logic device (PLD) (such as a field programmable gate array (FPGA)) is such an integrated circuit, and its logic function is determined by the user programming the device. The designer programs by himself to "integrate" a digital system on a PLD, without asking a chip manufacturer to design and make a dedicated integrated circuit chip. Moreover, nowadays, instead of manually making integrated circuit chips, this programming is mostly implemented by "logic compiler" software, which is similar to the software compiler used when writing program development, and the original code before compilation must also be written in a specific programming language, which is called hardware description language (HDL), and HDL is not just one, but many. Those skilled in the art should also be aware that it is only necessary to program the method flow slightly in the above-mentioned hardware description languages and program it into the integrated circuit to easily obtain the hardware circuit that implements the logic method flow.
以上所述的实施例仅仅是本说明书的优选实施例方式进行描述,并非对本说明书的范围进行限定,在不脱离本说明书的设计精神的前提下,本领域普通技术人员对本说明书的技术方案作出的各种变形及改进,均应落入本说明书的权利要求书确定的保护范围内。The embodiments described above are merely preferred embodiments of this specification and are not intended to limit the scope of this specification. Without departing from the design spirit of this specification, various modifications and improvements made to the technical solutions of this specification by ordinary technicians in this field should fall within the scope of protection determined by the claims of this specification.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411310655.9A CN118838719B (en) | 2024-09-20 | 2024-09-20 | A distributed computing load balancing method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411310655.9A CN118838719B (en) | 2024-09-20 | 2024-09-20 | A distributed computing load balancing method and system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118838719A true CN118838719A (en) | 2024-10-25 |
| CN118838719B CN118838719B (en) | 2025-01-10 |
Family
ID=93143043
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411310655.9A Active CN118838719B (en) | 2024-09-20 | 2024-09-20 | A distributed computing load balancing method and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118838719B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120353552A (en) * | 2025-04-10 | 2025-07-22 | 北京算立科技有限公司 | Computing power resource allocation scheduling method and system based on cloud computing platform |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104408106A (en) * | 2014-11-20 | 2015-03-11 | 浙江大学 | Scheduling method for big data inquiry in distributed file system |
| US20220012093A1 (en) * | 2015-10-28 | 2022-01-13 | Qomplx, Inc. | System and method for optimizing and load balancing of applications using distributed computer clusters |
| KR20220041576A (en) * | 2020-09-25 | 2022-04-01 | 주식회사 이노그리드 | Load balancing method and system for power efficiency in high-performance cloud service system using multiple computing nodes |
| US20220121633A1 (en) * | 2020-10-15 | 2022-04-21 | International Business Machines Corporation | Learning-based workload resource optimization for database management systems |
| CN115391045A (en) * | 2022-09-06 | 2022-11-25 | 中国科学院计算机网络信息中心 | A load balancing method and system based on machine learning assistance |
| CN118113458A (en) * | 2023-12-13 | 2024-05-31 | 天翼云科技有限公司 | SQL traffic scheduling method, system, medium and device based on machine learning |
| CN118245234A (en) * | 2024-05-28 | 2024-06-25 | 成都乐超人科技有限公司 | Distributed load balancing method and system based on cloud computing |
| CN118277105A (en) * | 2024-05-31 | 2024-07-02 | 天津南大通用数据技术股份有限公司 | Load balancing method, system and product for concurrent task distribution in distributed clusters |
-
2024
- 2024-09-20 CN CN202411310655.9A patent/CN118838719B/en active Active
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104408106A (en) * | 2014-11-20 | 2015-03-11 | 浙江大学 | Scheduling method for big data inquiry in distributed file system |
| US20220012093A1 (en) * | 2015-10-28 | 2022-01-13 | Qomplx, Inc. | System and method for optimizing and load balancing of applications using distributed computer clusters |
| KR20220041576A (en) * | 2020-09-25 | 2022-04-01 | 주식회사 이노그리드 | Load balancing method and system for power efficiency in high-performance cloud service system using multiple computing nodes |
| US20220121633A1 (en) * | 2020-10-15 | 2022-04-21 | International Business Machines Corporation | Learning-based workload resource optimization for database management systems |
| CN115391045A (en) * | 2022-09-06 | 2022-11-25 | 中国科学院计算机网络信息中心 | A load balancing method and system based on machine learning assistance |
| CN118113458A (en) * | 2023-12-13 | 2024-05-31 | 天翼云科技有限公司 | SQL traffic scheduling method, system, medium and device based on machine learning |
| CN118245234A (en) * | 2024-05-28 | 2024-06-25 | 成都乐超人科技有限公司 | Distributed load balancing method and system based on cloud computing |
| CN118277105A (en) * | 2024-05-31 | 2024-07-02 | 天津南大通用数据技术股份有限公司 | Load balancing method, system and product for concurrent task distribution in distributed clusters |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120353552A (en) * | 2025-04-10 | 2025-07-22 | 北京算立科技有限公司 | Computing power resource allocation scheduling method and system based on cloud computing platform |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118838719B (en) | 2025-01-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12099483B2 (en) | Rules based scheduling and migration of databases using complexity and weight | |
| US11847103B2 (en) | Data migration using customizable database consolidation rules | |
| US11392561B2 (en) | Data migration using source classification and mapping | |
| US9898522B2 (en) | Distributed storage of aggregated data | |
| KR101959153B1 (en) | System for efficient processing of transaction requests related to an account in a database | |
| US10715460B2 (en) | Opportunistic resource migration to optimize resource placement | |
| US8732118B1 (en) | Distributed performance of data aggregation operations | |
| CN102495857B (en) | Load balancing method for distributed database | |
| US8635250B2 (en) | Methods and systems for deleting large amounts of data from a multitenant database | |
| CN110019251A (en) | A kind of data processing system, method and apparatus | |
| Yankovitch et al. | HYPERSONIC: A hybrid parallelization approach for scalable complex event processing | |
| CN110058940B (en) | Data processing method and device in multi-thread environment | |
| US11816511B1 (en) | Virtual partitioning of a shared message bus | |
| CN103077197A (en) | Data storing method and device | |
| CN109614270A (en) | Data reading and writing method, device, equipment and storage medium based on Hbase | |
| CN120067223A (en) | Cross-engine data processing method and device and computer equipment | |
| CN118838719A (en) | Distributed computing load balancing method and system | |
| CN115129466B (en) | Hierarchical scheduling method, system, equipment and medium for cloud computing resources | |
| CN113407108A (en) | Data storage method and system | |
| JP2009037369A (en) | How to allocate resources to the database server | |
| US10594620B1 (en) | Bit vector analysis for resource placement in a distributed system | |
| CN118860587A (en) | Task processing method, device, electronic device, storage medium and program product | |
| US11733899B2 (en) | Information handling system storage application volume placement tool | |
| CN115827720A (en) | Big data query method and device, processor and electronic equipment | |
| CN113055476B (en) | Cluster type service system, method, medium and computing equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |