CN108259567B

CN108259567B - A Discovery Method of Large Data Center Business Subsystem Based on Server Application Logic

Info

Publication number: CN108259567B
Application number: CN201711403565.4A
Authority: CN
Inventors: 林友芳; 武志昊; 万怀宇
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2020-09-29
Anticipated expiration: 2037-12-22
Also published as: CN108259567A

Abstract

The invention relates to the technical field of server operation and maintenance management, in particular to a large-scale data center business subsystem discovery method based on server application logic. The method effectively overcomes the defects of the traditional operation and maintenance method by constructing the system architecture diagram of the data center, and does not need to spend a lot of manpower, material and financial resources to collect and count data; it does not require the operation and maintenance personnel to have sufficient experience and knowledge of the application system architecture of the data center. ; Can automatically build an application-based business logic network, effectively assisting the operation and maintenance personnel to master the server architecture and usage of the data center.

Description

A Discovery Method of Large Data Center Business Subsystem Based on Server Application Logic

技术领域technical field

本发明涉及服务器运维管理技术领域，尤其涉及一种基于服务器应用逻辑的大型数据中心业务子系统发现方法。The invention relates to the technical field of server operation and maintenance management, in particular to a large-scale data center business subsystem discovery method based on server application logic.

背景技术Background technique

近年来，大型数据中心的服务器数量的快速增长给其运维管理部门带来了巨大压力。运维管理人员越来越难以掌握数据中心内部的服务器的实际使用情况，及各项业务应用涉及的服务器和服务器之间的应用逻辑关系等情况。In recent years, the rapid increase in the number of servers in large data centers has brought enormous pressure to their operation and maintenance departments. It is increasingly difficult for operation and maintenance managers to grasp the actual usage of servers in the data center, as well as the application logic relationships between servers involved in various business applications.

目前，面对运行着复杂业务系统的数据中心，传统的运维方案是单纯依靠人工统计与一些初级运维工具，将数据中心的服务器使用情况存入CMDB(ConfigurationManagement Database，配置管理数据库)进行登记和统计，然后基于统计结果，人工绘制服务器的业务应用逻辑网络。这样的方式存在很多缺陷。一方面，数据中心在实际运行的过程中，业务系统架构变化快，业务系统复杂度高，CMDB中的服务器使用情况记录往往不完整或已经过时，导致运维人员很难准确把握系统的真实架构，人工统计也需要各部门协调，耗费大量人力和物力；另一方面，基于人工统计数据，依靠运维人员个人经验绘制出的服务器业务应用逻辑网络不具有可继承性，一旦发生人员调动，新的人员由于对业务系统不够了解，缺乏经验知识很难更新业务应用逻辑网络。At present, in the face of data centers running complex business systems, the traditional operation and maintenance solution is to simply rely on manual statistics and some primary operation and maintenance tools to store the server usage of the data center in the CMDB (Configuration Management Database) for registration. and statistics, and then based on the statistical results, manually draw the server's business application logic network. There are many flaws in this approach. On the one hand, during the actual operation of the data center, the business system architecture changes rapidly, the business system is highly complex, and the server usage records in the CMDB are often incomplete or outdated, making it difficult for operation and maintenance personnel to accurately grasp the real architecture of the system , manual statistics also require coordination of various departments, which consumes a lot of manpower and material resources; on the other hand, based on manual statistics, the server business application logic network drawn by the personal experience of operation and maintenance personnel is not inheritable. Once personnel are transferred, new Due to the lack of understanding of the business system and the lack of experience and knowledge, it is difficult for the personnel to update the business application logic network.

发明内容SUMMARY OF THE INVENTION

针对上述现有技术存在的不足，本发明提供了一种基于服务器应用逻辑的大型数据中心业务子系统发现方法，以实现对大型数据中心服务器进行有效的运维管理。Aiming at the shortcomings of the above-mentioned prior art, the present invention provides a large-scale data center service subsystem discovery method based on server application logic, so as to realize effective operation and maintenance management of large-scale data center servers.

本发明提供了如下方案，一种基于服务器应用逻辑的大型数据中心业务子系统发现方法，该方法包括一下步骤：The present invention provides the following solution, a method for discovering a large-scale data center business subsystem based on server application logic, the method comprising the following steps:

S1从大型数据中心服务器的原始日志数据中解析聚合出服务器数据、TCP连接数据以及引入外部CMDB业务系统信息数据，具体包括一下步骤：S1 parses and aggregates server data, TCP connection data, and introduces external CMDB business system information data from the original log data of large-scale data center servers, which includes the following steps:

S1.1将一定时间段内的大型数据中心服务器的原始日志数据进行解析聚合，从原始日志数据中提取出服务器数据，该服务器数据中至少包括服务器名称、服务器的IP列表、出度和入度、平均权重、用户数、进程数、内存和CPU使用情况、每小时的连接分布、服务器的分类以及一系列CPU、VSZ、I/O、IPCS的top10进程信息等；S1.1 Analyze and aggregate the raw log data of large data center servers within a certain period of time, and extract server data from the raw log data, which at least includes server name, server IP list, out-degree and in-degree , average weight, number of users, number of processes, memory and CPU usage, connection distribution per hour, server classification, and a series of top10 process information for CPU, VSZ, I/O, IPCS, etc.;

S1.2从原始日志数据中提取出TCP连接数据，该TCP连接数据至少包含了TCP连接两端的服务器名、连接的类别、连接的权重、连接的小时分布、发送以及接收队列等信息；S1.2 extracts the TCP connection data from the original log data. The TCP connection data contains at least the server names at both ends of the TCP connection, the type of the connection, the weight of the connection, the hourly distribution of the connection, the sending and receiving queues and other information;

S1.3从CMDB获取业务系统信息数据，该业务系统信息数据至少包含了业务系统名、业务系统与服务器的映射关系、业务系统负责人等信息；S1.3 Obtain business system information data from the CMDB, which at least includes the business system name, the mapping relationship between the business system and the server, and the person in charge of the business system;

S2基于S1提取出的数据聚合出业务系统的数据信息和业务系统之间的关联关系数据信息；S2 aggregates the data information of the business system and the relationship data information between the business systems based on the data extracted by S1;

其中，业务系统的数据信息具体包括：业务系统名、业务系统负责人、业务系统包含的服务器数量、连接数S_{edge_sum}、外部连接数S_{outside_edge}、内部连接数S_{inside_edge}、出度S_{out_degree}和入度S_{in_degree}、平均连接数S_{avg_edge}、平均权重S_{avg_weight}等信息；The data information of the business system specifically includes: the business system name, the person in charge of the business system, the number of servers included in the business system, the number of connections S _{edge_sum} , the number of external connections S _{outside_edge} , the number of internal connections S _{inside_edge} , the out degree S _{out_degree} and the in degree S _{in_degree} , average number of connections S _{avg_edge} , average weight S _{avg_weight} and other information;

连接数S_{edge_sum}等于外部连接数S_{outside_edge}与内部连接数S_{inside_edge}之和，公式如下：The number of connections S _{edge_sum} is equal to the sum of the number of external connections S _{outside_edge} and the number of internal connections S _{inside_edge} , the formula is as follows:

S_{edge_sum}＝S_{outside_edge}+S_{inside_edge} S _{edge_sum} =S _{outside_edge} +S _{inside_edge}

外部连接数S_{outside_edge}表示TCP连接的一端服务器在业务系统内部，另一端服务器不属于该业务系统；内部连接数S_{inside_edge}表示TCP连接的两端服务器都在业务系统内部；The number of external connections S _{outside_edge} indicates that the server at one end of the TCP connection is inside the business system, and the server at the other end does not belong to the business system; the number of internal connections S _{inside_edge} indicates that the servers at both ends of the TCP connection are inside the business system;

出度S_{out_degree}表示被该业务系统访问过的其他业务系统的数量；S_{in_degree}表示访问过该业务系统的其他业务系统的数量；Out degree S _{out_degree} represents the number of other business systems accessed by the business system; S _{in_degree} represents the number of other business systems that have accessed the business system;

平均连接数S_{avg_edge}表示业务系统内部连接数S_{inside_edge}和业务系统内部存在连接的服务器数量的一个比值；平均权重S_{avg_weight}表示业务系统内部连接权重之和的平均值，公式如下：The average number of connections S _{avg_edge} represents the ratio of the number of internal connections S _{inside_edge} of the business system to the number of servers connected within the business system; the average weight S _{avg_weight} represents the average value of the sum of the weights of the internal connections of the business system, the formula is as follows:

S_{avg_weight}＝Sum(W _{in_degree})/S_{inside_edge} S _{avg_weight} =Sum(W _{in_degree} )/S _{inside_edge}

其中，业务系统之间的关联关系数据信息包括：关联服务端业务系统、关联客户端业务系统、关联权重S_weight等信息；Wherein, the association relationship data information between business systems includes: associated server business system, associated client business system, associated weight S _weight and other information;

业务系统间关联权重S_weight等于位于两个业务系统之间的TCP连接权重累加之和；The association weight S _weight between the business systems is equal to the cumulative sum of the TCP connection weights between the two business systems;

S3基于S1和S2数据信息对业务系统内部服务器进行层次划分；S3 divides the internal servers of the business system into layers based on S1 and S2 data information;

S3.1找出业务系统内部的边界服务器border_vm，边界服务器border_vm定义为与公网IP有过连接的或和外部非监控业务系统有过连接的非数据库类型的服务器；将边界服务器border_vm的层次Level标记为0；S3.1 Find out the border server border _vm inside the business system, the border server border _vm is defined as a non-database type server that has been connected to the public network IP or has been connected to an external non-monitoring business system; the border server border _vm The level of Level is marked as 0;

S3.2从0层边界服务器为起点，根据服务器的TCP连接数据，在业务系统内尚未划分层次的服务器中找出被0层边界服务器访问过的服务器作为0层边界服务器的下一层，层次Level标记为1，其中，若服务器不再访问其他服务器而只作为服务端给其他服务器提供服务的，称为根服务器root_vm，根服务器root_vm的Level标记为-1；S3.2 Starting from the layer 0 boundary server, according to the TCP connection data of the server, find the server that has been accessed by the layer 0 boundary server among the servers that have not been divided into layers in the business system as the next layer of the layer 0 boundary server. The Level is marked as 1, among which, if the server no longer accesses other servers but only serves as a server to provide services to other servers, it is called the root server root _vm , and the Level of the root server root _vm is marked as -1;

S3.3再以标记为Level+1的服务器为起点重复S3.2，直至业务系统内所有服务器都完成层次划分；S3.3 repeats S3.2 with the server marked as Level+1 as the starting point, until all servers in the business system have completed the level division;

S4基于S3业务系统内部层次划分结果，再根据S1提取出的TCP连接数据对每层服务器进行分组；通过计算每层服务器中任意两个服务器之间的相似度是否达到阈值，达到则可划分到同一组，否则不同组；S4 is based on the internal hierarchical division result of the S3 business system, and then groups each layer of servers according to the TCP connection data extracted by S1; by calculating whether the similarity between any two servers in each layer of servers reaches the threshold, it can be divided into The same group, otherwise different groups;

两个服务器vm1和vm2之间的相似度similarity_vm1&vm2定义为客户端雅克比系数client_coef加上服务端雅克比系数server_coef之和，公式如下：The similarity _vm1& vm2 between the two servers vm1 and vm2 is defined as the sum of the client Jacobian coefficient client _coef plus the server Jacobian coefficient server _coef , the formula is as follows:

similarity_vm1&vm2＝client_coef+server_coef similarity _vm1&vm2 =client _coef +server _coef

客户端雅克比系数client_coef等于服务器vm1作为客户端访问过的服务器集合server_vm1与服务器vm2作为客户端访问过的服务器集合server_vm2的交集与server_vm1和server_vm2并集的比，公式如下：The client Jacobian coefficient client _coef is equal to the ratio of the intersection of the server set server _vm1 accessed by server vm1 as a client and the server set server _vm2 accessed by server vm2 as a client to the union set of server _vm1 and server _vm2 . The formula is as follows:

client_coef＝(server_vm1∩server_vm2)/(server_vm1∪server_vm2)client _coef =(server _vm1 ∩server _vm2 )/(server _vm1 ∪server _vm2 )

服务端雅克比系数server_coef等于服务器vm1作为服务端服务过的服务器集合client_vm1与服务器vm2作为服务端服务过的服务器集合client_vm2的交集与client_vm1和client_vm2并集的比，公式如下：The server Jacobian coefficient server _coef is equal to the ratio of the intersection of the server set client _vm1 served by server vm1 as the server and the server set client _vm2 served by server vm2 as the server to the union of client _vm1 and client _vm2 . The formula is as follows:

server_coef＝(client_vm1∩client_vm2)/(client_vm1∪client_vm2)server _coef =(client _vm1 ∩client _vm2 )/(client _vm1 ∪client _vm2 )

S5基于S3和S4的分组发现的业务子系统结果建立分组的数据信息以及分组的关联关系数据信息；S5 establishes grouped data information and grouped association relationship data information based on the business subsystem results of the grouping discovery of S3 and S4;

S6基于S5建立的分组关联关系构建出数据中心业务系统架构图。S6 constructs a data center business system architecture diagram based on the grouping association relationship established by S5.

本发明具有以下技术效果：本发明提出的基于服务器应用逻辑的大型数据中心业务子系统发现方法，通过构建数据中心系统架构图，有效克服了传统运维方法的缺陷，无须耗费大量人力物力财力去采集、统计数据；无须运维人员对数据中心的应用的系统架构有充分的经验知识；能自动构建基于应用的业务逻辑网络，有效辅助运维人员掌握数据中心的服务器架构及使用情况。The present invention has the following technical effects: the method for discovering large-scale data center business subsystems based on server application logic proposed by the present invention effectively overcomes the defects of traditional operation and maintenance methods by constructing a data center system architecture diagram, and does not need to spend a lot of manpower, material resources and financial resources. Collect and count data; it is not necessary for the operation and maintenance personnel to have sufficient experience and knowledge of the application system architecture of the data center; it can automatically build an application-based business logic network, effectively assisting the operation and maintenance personnel to master the server architecture and usage of the data center.

附图说明Description of drawings

图1为本发明提出的基于服务器应用逻辑的大型数据中心业务子系统发现方法构建业务系统架构图的流程图；Fig. 1 is a flow chart of constructing a business system architecture diagram based on a large-scale data center business subsystem discovery method based on server application logic proposed by the present invention;

图2为本发明最终得到的基于服务器应用逻辑的大型数据中心业务子系统发现方法构建业务系统架构图。FIG. 2 is an architecture diagram of a business system constructed by a method for discovering business subsystems of large data centers based on server application logic finally obtained by the present invention.

具体实施方式Detailed ways

本发明实施例提出的一种基于服务器应用逻辑的大型数据中心业务子系统发现方法，来构建业务系统架构图的处理流程包括以下几个步骤：A method for discovering a large-scale data center business subsystem based on server application logic proposed by an embodiment of the present invention, the processing flow for constructing a business system architecture diagram includes the following steps:

S1从大型数据中心服务器的原始日志数据中解析聚合出服务器数据、TCP连接数据以及引入外部CMDB业务系统信息数据,包括：S1 parses and aggregates server data, TCP connection data and external CMDB business system information data from the original log data of large data center servers, including:

S1.1解析聚合一定时间段的大型数据中心服务器的原始日志数据，上述时间段以天为单位，在实际应用中时间段也可以选择多天数据进行融合，提取出的服务器数据中包括服务器名称、服务器的IP列表、出度和入度、平均权重、用户数、进程数、内存和CPU使用情况、每小时的连接分布、服务器的分类以及一系列CPU、VSZ、I/O、IPCS的top10进程信息等。S1.1 Parse and aggregate the raw log data of large-scale data center servers for a certain period of time. The above period of time is in days. In practical applications, multiple days of data can also be selected for fusion. The extracted server data includes the server name. , IP list of servers, out-degree and in-degree, average weight, number of users, number of processes, memory and CPU usage, connection distribution per hour, classification of servers and a series of top10 for CPU, VSZ, I/O, IPCS process information, etc.

提取出的服务器数据包括如下的表1所示的字段：The extracted server data includes the fields shown in Table 1 below:

表1Table 1

S1.2从原始日志数据中解析聚合提取出TCP连接数据包含了TCP连接两端的服务器名、连接的类别、连接的权重、连接的小时分布、发送以及接收队列等信息。S1.2 parses and aggregates the original log data and extracts the TCP connection data, including the server names at both ends of the TCP connection, the type of the connection, the weight of the connection, the hourly distribution of the connection, and the sending and receiving queues.

提取出的TCP连接数据包括如下的表2所示的字段：The extracted TCP connection data includes the fields shown in Table 2 below:

表2Table 2

编号Numbering 字段名field name 字段说明Field Description 11 idid 自增idauto increment id 22 source_machine_idsource_machine_id tcp连接中做服务端的服务器Server as server in tcp connection 33 target_machine_idtarget_machine_id tcp连接中做客户端的服务器The server that is the client in the tcp connection 44 vm_edge_service_classvm_edge_service_class tcp连接的连接类别Connection class for tcp connection 55 weightweight tcp连接的权重tcp connection weight 66 datedate 日期date 77 conn_dis_hourconn_dis_hour tcp连接的小时分布hourly distribution of tcp connections 88 type_conntype_conn 不同连接类别的连接分布Connection distribution for different connection classes 99 sum_recvqsum_recvq tcp连接的接收队列Receive queue for tcp connection 1010 sum_sendqsum_sendq tcp连接的发送队列send queue for tcp connection

S1.3从CMDB获取业务系统信息数据，该业务系统信息数据至少包含了业务系统名、业务系统与服务器的映射关系、业务系统负责人等信息。S1.3 Obtain business system information data from the CMDB, where the business system information data at least includes the business system name, the mapping relationship between the business system and the server, and the person in charge of the business system.

外部引入的CMDB业务系统信息数据包括如下的表3所示的字段：The externally introduced CMDB business system information data includes the fields shown in Table 3 below:

表3table 3

S2基于S1提取出的数据聚合出业务系统的数据信息和业务系统之间的关联关系数据信息。S2 aggregates the data information of the business system and the relational relationship data information between the business systems based on the data extracted by S1.

其中，业务系统的数据信息具体包括：业务系统名、业务系统负责人、业务系统包含的服务器数量、连接数、外部连接数、内部连接数、出度和入度、平均连接数、平均权重等信息。Among them, the data information of the business system specifically includes: the name of the business system, the person in charge of the business system, the number of servers included in the business system, the number of connections, the number of external connections, the number of internal connections, out-degree and in-degree, average number of connections, average weight, etc. information.

聚合后的业务系统的数据信息包括如下的表4所示的字段：The data information of the aggregated business system includes the fields shown in Table 4 below:

表4Table 4

编号Numbering 字段名field name 字段说明Field Description 11 idid 自增idauto increment id 22 system_namesystem_name 业务系统名business system name 33 headerheader 业务系统负责人business system manager 44 vm_numvm_num 业务系统共包含的服务器数量The total number of servers included in the business system 55 edge_sumedge_sum 业务系统包含的连接数The number of connections included in the business system 66 outside_edgeoutside_edge 业务系统包含的外部连接数The number of external connections included in the business system 77 inside_edgeinside_edge 业务系统包含的内部连接数The number of internal connections included in the business system 88 out_degreeout_degree 业务系统的出度Out-degree of business system 99 in_degreein_degree 业务系统的入度In-degree of business system 1010 avg_edgeavg_edge 业务系统的平均连接数Average number of connections to business systems 1111 avg_weightavg_weight 业务系统的平均连接权重Average connection weight of business systems 1212 datadata 日期date

其中，业务系统之间的关联关系数据信息包括：关联服务端业务系统、关联客户端业务系统、关联权重等信息。Wherein, the data information of the association relationship between the business systems includes information such as the association server business system, the association client business system, and the association weight.

聚合后的业务系统之间的关联关系数据信息包括如下的表5所示的字段：The association relationship data information between the aggregated business systems includes the fields shown in Table 5 below:

表5table 5

编号Numbering 字段名field name 字段说明Field Description 11 idid 自增idauto increment id 22 source_system_idsource_system_id 服务端业务系统Server business system 33 target_system_idtarget_system_id 客户端业务系统Client business system 44 weightweight 关联权重Association weight 55 datedate 日期date

S3基于S1和S2数据信息对业务系统内部服务器进行的层次划分。S3 is the hierarchical division of the internal server of the business system based on the data information of S1 and S2.

S4基于S3业务系统内部层次划分结果，再根据S1提取出的TCP连接数据对每层服务器进行分组。S4 is based on the internal layer division result of the S3 service system, and then groups each layer of servers according to the TCP connection data extracted by S1.

S5基于S3和S4的分层、分组发现的业务子系统结果建立分组的数据信息以及分组的关联关系数据信息。S5 establishes the data information of the group and the data information of the association relationship of the group based on the layering of S3 and S4 and the service subsystem result of group discovery.

分组的数据信息和分组的关联关系数据信息包含如下表6、表7所示的字段：The data information of the group and the data information of the association relationship of the group include the fields shown in Table 6 and Table 7 below:

表6Table 6

编号Numbering 字段名field name 字段说明Field Description 11 idid 自增idauto increment id 22 datedate 日期date 33 group_namegroup_name 分组名group name 44 system_idsystem_id 分组所在业务系统The business system where the group is located 55 level_idlevel_id 分组在业务系统中的层级Hierarchy grouped in business system 66 group_idgroup_id 分组标号group label 77 machine_nummachine_num 分组所包含的服务器数量The number of servers included in the group 88 machine_listmachine_list 分组所包含的服务器列表List of servers included in the group

表7Table 7

序号serial number 字段名field name 字段说明Field Description 11 idid 自增idauto increment id 22 datedate 日期date 33 source_group_idsource_group_id 关联关系中作为服务端的分组Group as a server in an association relationship 44 target_group_idtarget_group_id 关联关系中作为客户端的分组Group as client in relationship 55 weightweight 分组关联关系中的关联权重Association weights in grouped associations

S6基于S5建立的分组关联关系构建出数据中心业务系统架构图，如图2所示。S6 builds a data center business system architecture diagram based on the grouping association relationship established by S5, as shown in Figure 2.

综上所述，本发明实施例提出的基于服务器应用逻辑的大型数据中心业务子系统发现方法，来构建业务系统架构图，有效克服了传统运维方法的缺陷，无须耗费大量人力物力财力去采集、统计数据；无须运维人员对数据中心的应用的系统架构有充分的经验知识；能自动构建基于应用的业务逻辑网络，有效辅助运维人员掌握数据中心的业务系统架构及使用情况。To sum up, the method for discovering large-scale data center business subsystems based on server application logic proposed in the embodiments of the present invention is used to construct a business system architecture diagram, which effectively overcomes the defects of traditional operation and maintenance methods, and does not need to spend a lot of manpower, material resources and financial resources to collect , statistical data; it is not necessary for the operation and maintenance personnel to have sufficient experience and knowledge of the application system architecture of the data center; it can automatically build an application-based business logic network, effectively assisting the operation and maintenance personnel to master the business system architecture and usage of the data center.

本发明构建的业务系统架构图可以真实反映业务应用系统在服务器上的部署情况，辅助运维人员管理。且该发明的输入仅需要数据中心的服务器日志快照数据，基于解释聚合后的服务器日志快照数据，本发明能够自动发现业务系统架构图，不仅结果准确，而且无须过多人操作，节省了大量人力、物力开支。The business system architecture diagram constructed by the present invention can truly reflect the deployment situation of the business application system on the server and assist the management of operation and maintenance personnel. And the input of the invention only needs the snapshot data of the server log of the data center. Based on the snapshot data of the server log after interpretation and aggregation, the invention can automatically discover the business system architecture diagram, not only the result is accurate, but also there is no need for many people to operate, saving a lot of manpower , material expenses.

Claims

1. A large data center service subsystem discovery method based on server application logic is characterized by comprising the following steps:

s1 parsing and aggregating server data, TCP connection data, and information data of an external CMDB service system from the original log data of the large-scale data center server, specifically including the following steps:

s1.1, analyzing and aggregating original log data of a large-scale data center server in a certain time period, and extracting server data from the original log data, wherein the server data at least comprises a server name, an IP list, an out-degree and an in-degree, an average weight, a user number, a process number, a memory and CPU use condition, hourly connection distribution, classification of the server, top10 process information of a series of CPUs, VSZ and I/O, IPCS and the like;

s1.2, extracting TCP connection data from original log data, wherein the TCP connection data at least comprises information such as server names at two ends of a TCP connection, connection types, connection weights, connection hour distribution, sending and receiving queues and the like;

s1.3, acquiring service system information data from the CMDB, wherein the service system information data at least comprises information such as a service system name, a mapping relation between a service system and a server, a service system person in charge and the like;

s2 aggregating the data information of the service system and the incidence relation data information between the service systems based on the data extracted in S1;

the data information of the service system specifically includes: name of service system, person in charge of service system, number of servers and number of connections S contained in service system_{edge_sum}External connection number S_{outside_edge}Internal connection number S_{inside_edge}And the degree of departure S_{out_degree}And degree of penetration S_{in_degree}Average number of connections S_{avg_edge}Average weight S_{avg_weight}And the like;

number of connections S_{edge_sum}Equal to the number of external connections S_{outside_edge}Number of connections to internal_{inside_edge}The formula is as follows:

S_{edge_sum}＝S_{outside_edge}+S_{inside_edge}

number of external connections S_{outside_edge}The server at one end of the TCP connection is shown in the service system, and the server at the other end does not belong to the service system; number of interconnections S_{inside_edge}The server at both ends of the TCP connection is shown in the service system;

output S_{out_degree}Indicating the number of other business systems accessed by the business system; s_{in_degree}Indicating the number of other business systems that have accessed the business system;

average number of connections S_{avg_edge}Indicating the number of connections S within a service system_{inside_edge}A ratio to the number of servers connected within the service system; average weight S_{avg_weight}The average value of the sum of the internal connection weights of the service system is represented by the following formula:

S_{avg_weight}＝Sum(W_{in_degree})/S_{inside_edge}

the incidence relation data information between the service systems comprises: associated server service system, associated client service system and associated weight S_weightAnd the like;

correlation weight S between service systems_weightEqual to the cumulative sum of the weights of the TCP connections between the two traffic systems;

s3, based on S1 and S2 data information, making hierarchy division for service system internal server;

s3.1 finding boundary server border inside service system_vmBorder Server Border_vmA non-database type server defined as having connection with the public network IP or having connection with an external non-monitoring service system; border border server_vmThe Level of (1) is marked as 0;

s3.2, taking a 0-layer boundary server as a starting point, according to TCP connection data of the server, finding a server accessed by the 0-layer boundary server from servers which are not divided into layers in the service system as the next layer of the 0-layer boundary server, wherein the layer Level is marked as 1, and if the server does not access other servers and only serves as a server side to provide services for other servers, the server is called a root server root_vmRoot server root_vmThe Level of (a) is marked as-1;

s3.3, repeating S3.2 by taking the server marked as Level +1 as a starting point until all servers in the service system complete hierarchical division;

s4 based on the internal hierarchy division result of the S3 service system, grouping each layer of server according to the TCP connection data extracted in S1; calculating whether the similarity between any two servers in each layer of servers reaches a threshold value, if so, dividing the servers into the same group, otherwise, dividing the servers into different groups;

similarity between two servers vm1 and vm2_vm1&vm2Defined as client Jacobian coefficient client_coefPlus server Jacobian coefficient server_coefThe formula is as follows:

similarity_vm1&vm2＝client_coef+server_coef

client-side Jacobian coefficient client_coefEqual to the server vm1 as the server set server accessed by the client_vm1Server set server accessed as client with server vm2_vm2Intersection and server of_vm1And server_vm2The union ratio, the formula is as follows:

client_coef＝(server_vm1∩server_vm2)/(server_vm1∪server_vm2)

jacobian coefficient server of server_coefEqual to the server vm1 serving as the server side_vm1Server set client served as server with server vm2_vm2Intersection of and client_vm1And client_vm2The union ratio, the formula is as follows:

server_coef＝(client_vm1∩client_vm2)/(client_vm1∪client_vm2)

s5 establishing grouped data information and grouped incidence relation data information based on the service subsystem result of the grouped discovery of S3 and S4;

s6 constructs the architecture diagram of the data center service system based on the grouping incidence relation established in S5.