CN114124503B

CN114124503B - An Intelligent Network Awareness Method for Level-by-Level Concurrent Cache Optimizing Efficiency

Info

Publication number: CN114124503B
Application number: CN202111350286.2A
Authority: CN
Inventors: 韩道岐; 陆月明; 王东滨; 杨键; 王皓; 吕陆琴
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-09-27
Anticipated expiration: 2041-11-15
Also published as: CN114124503A

Abstract

The invention discloses an intelligent network perception method for level-by-level concurrent cache optimization efficiency, which belongs to the field of network security. High-capacity memory and NIC core, build large-capacity on-chip storage resource SPM, and cache traffic obtained by mirroring. Based on DPDK technology, multiple physical network cards can collect traffic concurrently in sub-regions. Data packets are received and distributed through two-level lock-free concurrency and user-mode protocol stacks; then, through each protocol field analysis and concurrency detection processing, IP protocol data field extraction, filtering, and cache processing are completed, and distributed to different capabilities through three channels The subsystem performs detection processing; finally, it integrates the results of various detections, saves abnormal samples, samples normal samples, and associates rich contextual information. The invention meets the deployment requirements of a high-performance security situational awareness system in a future smart home network environment.

Description

An Intelligent Network Awareness Method for Level-by-Level Concurrent Cache Optimizing Efficiency

技术领域technical field

本发明属于网络安全技术领域，具体涉及一种逐级并发缓存优化效能的智能网络感知方法。The invention belongs to the technical field of network security, and in particular relates to an intelligent network perception method for level-by-level concurrent cache optimization efficiency.

背景技术Background technique

随着物联网设备的激增和智能化，以及云服务的大规模普及，推动了新的边缘计算架构快速崛起。新架构在用户周边，通过构建大规模自治物联网系统，形成了设备统一管理、数据缓存、隐私保护和智能服务等面向用户的特色服务，方便用户一点式管理和决策。用户可使用智能化和高性能计算类型的设备，构建多个冗余式本地的聚合服务中心，解决响应时间、电池寿命、传感器联网智治、节约带宽以及数据安全和隐私等问题。The proliferation and intelligence of IoT devices and the mass adoption of cloud services have driven the rapid rise of new edge computing architectures. The new architecture is around users. By building a large-scale autonomous IoT system, user-oriented characteristic services such as unified management of equipment, data caching, privacy protection and intelligent services are formed, which facilitates one-point management and decision-making for users. Users can use intelligent and high-performance computing type devices to build multiple redundant local aggregated service centers to solve problems such as response time, battery life, intelligent governance of sensor networking, bandwidth saving, and data security and privacy.

但是，传统的入侵检测和态势感知系统由于存在误报率高、漏洞和应用层检测能力差等问题，需要研究新一代态势感知和信息安全评估模型，能够全面感知融合网络流量、日志、资产、内部管理(状态、审计和关系等已有结构化数据)等多个维度的不同类型数据，关联扩展上下文信息后，提升攻击检测、异常提示和关联预警提示的能力。并且面向未来的6G高性能智能家庭安全网关，实现连接镜像端口和家庭局域网，高速采集网络流量数据。However, the traditional intrusion detection and situational awareness system has problems such as high false alarm rate, loopholes and poor application-layer detection capabilities. Different types of data in multiple dimensions such as internal management (existing structured data such as status, audit, and relationship) are associated with extended context information to improve the capabilities of attack detection, anomaly prompts, and associated early warning prompts. And the future-oriented 6G high-performance smart home security gateway can connect mirror ports and home LAN, and collect network traffic data at high speed.

构建的轻量式入侵检测和防御系统，可轻量化部署整合到物联网的边缘设备中，实现全维度抽取特征和用户画像。为适配物联网海量时序数据源的特点，还需要采用轻量化分层分片方式的大数据管理技术，实现内存、本地硬盘和云盘逐级脱敏式存储。迭代优选针对不同时序周期保鲜要求的合适检测模型和匹配的数据集，形成自动化演进和适配不同环境的能力。The constructed lightweight intrusion detection and prevention system can be deployed and integrated into the edge devices of the Internet of Things to achieve full-dimensional extraction of features and user portraits. In order to adapt to the characteristics of the massive time series data sources of the Internet of Things, it is also necessary to adopt the big data management technology of the lightweight layered and sharded method to realize the layer-by-level desensitization storage of memory, local hard disk and cloud disk. Iteratively optimizes the appropriate detection models and matching datasets for different time series preservation requirements, forming the ability to automate evolution and adapt to different environments.

随着边缘计算技术的逐步成熟，以及实时性和贴近用户服务的特点，已经能够支撑家庭实时检测和响应的安全防护需求。6G高速网关未来会成为家庭的关键网络设备配置，基于6G网关的镜像端口和透明连接家庭局域网，可完成采集基础网络流量、资产数据、日志记录任务，融合散落的数据和冗余的多源告警，深度解析报文内容，呈现和预测未来智慧家庭中的网关设备、主机设备、安全设备、移动设备和物联网等设备的综合安全态势。研究下一代智能化具备全员联动特征的网络安全态势感知模型迫在眉睫。With the gradual maturity of edge computing technology, as well as the characteristics of real-time and close-to-user services, it has been able to support the security protection requirements of real-time detection and response of families. The 6G high-speed gateway will become a key network device configuration in the home in the future. Based on the mirror port of the 6G gateway and the transparent connection to the home LAN, it can complete the collection of basic network traffic, asset data, and log recording tasks, and integrate scattered data and redundant multi-source alarms. , in-depth analysis of the content of the message, presenting and predicting the comprehensive security situation of gateway devices, host devices, security devices, mobile devices and IoT devices in the future smart home. It is imminent to study the next-generation intelligent network security situational awareness model with the characteristics of full linkage.

发明内容SUMMARY OF THE INVENTION

基于现有集中式网络态势感知系统的中心化采集检测和存储方法中，忽略了用户数据权益和体验，忽略了环境限制的现状，无法有效解决智慧家庭环境中实时态势感知和威胁评估的问题，本发明提出一种逐级并发缓存优化效能的智能网络感知方法，在智能物联网AIoT(Artificial intelligence internet of things)环境中基于边缘计算技术的轻量级网络流量采集、检测、分析和存储的并行处理，解决智慧家庭环境中的网络安全态势感知的效能、自适应、可解释和层次化等问题。The centralized acquisition, detection and storage method based on the existing centralized network situational awareness system ignores user data rights and experience, ignores the status quo of environmental limitations, and cannot effectively solve the problem of real-time situational awareness and threat assessment in the smart home environment. The present invention proposes an intelligent network perception method for level-by-level concurrent cache optimization efficiency, which is a parallel implementation of light-weight network traffic collection, detection, analysis and storage based on edge computing technology in an AIoT (Artificial intelligence internet of things) environment. Processing and solving problems such as the efficiency, self-adaptation, interpretability and hierarchy of network security situational awareness in the smart home environment.

所述的网络流量采集、检测和存储方法，具体步骤如下：The specific steps of the network traffic collection, detection and storage method are as follows:

步骤一、在SoC为基础的嵌入式计算平台上，将单个网卡核和单个CPU核配对，形成独立分区，每个分区中通过集成内存分别构建若干大容量片上存储资源SPM(Scratch PadMemory)块，直接使用内存地址进行访问；由此构成了家庭互联网通信网关设备。Step 1. On the SoC-based embedded computing platform, pair a single network card core and a single CPU core to form independent partitions. In each partition, several large-capacity on-chip storage resource SPM (Scratch PadMemory) blocks are constructed through integrated memory. Directly using the memory address for access; thus constitutes a home Internet communication gateway device.

各SPM块内存储2M的数据，包括网卡的数据和CPU处理的数据；若干个SPM块构建出高速缓存分区，适配不同类型网卡包处理速率要求；通过内存地址直接访问每个SPM块上存储的数据，通过起始地址和负荷数据长度，进行各块之间的连续流式访问，无需通过数据总线传递。实现用户态的零拷贝直接访问数据和大缓存连续流式读取的无丢包技术。Each SPM block stores 2M data, including the data of the network card and the data processed by the CPU; several SPM blocks build a cache partition to adapt to the packet processing rate requirements of different types of network cards; directly access the storage on each SPM block through the memory address The data, through the starting address and the length of the payload data, perform continuous streaming access between each block without passing through the data bus. It realizes zero-copy direct access to data in user mode and a non-packet loss technology for continuous streaming read from large cache.

步骤二、将家庭互联网通信网关设备与家中的各上网设备分别连接，每个网卡并行采集不同上网设备的流量，并基于数据平面开发套件DPDK(Data Plane DevelopmentKit)技术进行用户态多线程并发处理；Step 2: Connect the home Internet communication gateway device to each Internet access device in the home respectively, each network card collects the traffic of different Internet access devices in parallel, and performs user-mode multi-threaded concurrent processing based on the data plane development kit DPDK (Data Plane DevelopmentKit) technology;

具体过程为：The specific process is:

首先，每个网卡由一个分发主进程收包，将收包按5元组排序后，对字符串hash计算构建唯一并发UFID，计算公式如下：First, each network card is received by a main distribution process. After sorting the received packets by 5-tuple, a unique concurrent UFID is constructed by hash calculation of the string. The calculation formula is as follows:

UFID＝shash(sort(srcip,srcport,dstip,dstport),protocol) (1)UFID=shash(sort(srcip,srcport,dstip,dstport),protocol) (1)

其中srcip,srcport,dstip,dstport分别为源ip地址、源端口、目的ip和目的端口；通过sort处理保证收发双向数据包，这四个字段拼接成为同一顺序的字符串，追加protocol协议类型字段后，使用shash算法计算字符串hash id；Among them, srcip, srcport, dstip, and dstport are the source ip address, source port, destination ip and destination port respectively; the two-way data packets are guaranteed to be sent and received through sort processing. These four fields are spliced into a string in the same order, and the protocol protocol type field is added after the , use the shash algorithm to calculate the string hash id;

然后，根据并发通道数C，相同流的双向数据包的字符串都进行hash计算后分别归属到对应的Qi队列中存储，通过队列分发给处理子进程。Then, according to the number of concurrent channels C, the strings of the bidirectional data packets of the same flow are hashed and then stored in the corresponding Qi queues, and distributed to the processing sub-processes through the queues.

步骤三、处理后的流量包根据IP协议检测和路由配置，并行分发到三通道子系统进行检测；Step 3: The processed traffic packets are distributed to the three-channel subsystem for detection in parallel according to IP protocol detection and routing configuration;

各个子进程对各个流量包完成IP协议数据字段提取、过滤和缓存后，通过三个子系统进行处理，并行检测和融合网络行为异常的记录；三通道子系统包括应用层异常规则检测子系统、网络层流量特征和异常模型检测子系统、PCAP和取证文件检测子系统；After each subprocess completes IP protocol data field extraction, filtering and caching for each traffic packet, it is processed by three subsystems, and the records of abnormal network behavior are detected and merged in parallel; the three-channel subsystem includes the application layer abnormal rule detection subsystem, network Layer traffic signature and anomaly model detection subsystem, PCAP and forensic file detection subsystem;

具体步骤如下：Specific steps are as follows:

步骤301、将处理后的流量包按IP隧道方式分发到应用层异常规则检测子系统，提取定义规则的元数据和应用层协议元数据，通过计算逻辑表达式进行规则匹配式检测；Step 301: Distribute the processed traffic packets to the application layer abnormal rule detection subsystem in an IP tunnel mode, extract the metadata of the definition rule and the application layer protocol metadata, and perform rule matching type detection by calculating logical expressions;

元数据包括网络tcp/ip包五元组、各协议字段、上行字节数和下行字节数等统计字段；Metadata includes statistical fields such as network tcp/ip packet quintuple, each protocol field, the number of upstream bytes and the number of downstream bytes;

应用层协议元数据如http协议的URL、请求码、响应码、提交的数据和响应的数据；Application layer protocol metadata such as http protocol URL, request code, response code, submitted data and response data;

规则匹配式检测为了发现具备明显网络流量和协议字段特征的异常行为，实时告警。The rule-matching detection detects abnormal behaviors with obvious characteristics of network traffic and protocol fields, and generates real-time alarms.

步骤302、提取的应用层协议元数据通过分布式消息队列转发到机器学习模型中，利用网络层流量特征和异常模型检测子系统进行准实时深度检测，发现语义和相似度等模糊化的用户行为标签，如恶意行为、异常数据传输、信息探测等活动标签；Step 302: The extracted application layer protocol metadata is forwarded to the machine learning model through the distributed message queue, and the quasi-real-time depth detection is performed by using the network layer traffic characteristics and anomaly model detection subsystem, and the fuzzy user behaviors such as semantics and similarity are found. Tags, such as malicious behavior, abnormal data transmission, information detection and other active tags;

具体为：针对在应用层携带和触发恶意行为执行的网络流量，建立特征字段扫描、状态机推理、机器学习模型分类、异常语义和异常逻辑流程等多个分析模型，从多个维度标签网络流量的应用行为类型；Specifically, for network traffic that carries and triggers malicious behavior at the application layer, multiple analysis models such as feature field scanning, state machine inference, machine learning model classification, abnormal semantics, and abnormal logic flow are established to label network traffic from multiple dimensions. the type of application behavior;

步骤303、同时，将提取的元数据中的IP协议数据字段，直接存储到时序数据库，进行流加工和多种指标统计，并将统计的指标通过分布式消息队列转发到机器学习模型，利用网络层流量特征和异常模型检测子系统进行长周期的网络行为异常检测和分类检测，发现时序类统计指标表征的未知异常；Step 303: At the same time, directly store the IP protocol data fields in the extracted metadata in the time series database, perform stream processing and statistics on various indicators, and forward the statistical indicators to the machine learning model through the distributed message queue, and use the network The layer traffic feature and anomaly model detection subsystem performs long-term network behavior anomaly detection and classification detection, and finds unknown anomalies represented by time-series statistical indicators;

步骤304、通过本地创建的虚拟网卡，备份PCAP文件，根据订阅规则扫描PCAP文件并取证网络传输的资源文件，通过分布式消息队列转发到PCAP和取证文件检测子系统进行病毒检测。Step 304: Back up the PCAP file through the locally created virtual network card, scan the PCAP file according to the subscription rules, and collect forensic resource files transmitted over the network, and forward it to the PCAP and forensic file detection subsystem through the distributed message queue for virus detection.

同时建立用户发送/接收数据资源文件的分类历史档案存证；At the same time, establish the classified historical archives of data resource files sent/received by users;

步骤四、将三通道子系统输出的各个检测结果进行迭代融合；Step 4: Iteratively fuse each detection result output by the three-channel subsystem;

具体迭代融合过程包括：The specific iterative fusion process includes:

首先，定期采集三通道子系统输出的检测结果，通过降维、向量化、稀疏和重构增强训练数据，训练检测和生成深度学习模型，利用强化学习和对抗生成算法，构建模型的可持续自学习训练环境。First, regularly collect the detection results output by the three-channel subsystem, enhance the training data through dimensionality reduction, vectorization, sparseness and reconstruction, train detection and generate a deep learning model, and use reinforcement learning and confrontation generation algorithms to build a sustainable and autonomous model. Learning training environment.

然后，按准确率高和耗时低的标准，排序选择各训练好的深度学习模型，按指数等比缩小远期历史模型的权值，进行时序式指数移动平均加权融合学习，得到融合的检测结果；Then, according to the criteria of high accuracy and low time-consuming, the trained deep learning models are sorted and selected, the weights of the long-term historical models are reduced exponentially, and the time-series exponential moving average weighted fusion learning is performed to obtain the fusion detection. result;

具体融合按三个主题进行，包括：The specific integration is carried out according to three themes, including:

1)集成网络层、应用层、用户和关系层获得的多源深度学习模型，通过贝叶斯推理进行集成学习；1) Integrate the multi-source deep learning model obtained by the network layer, application layer, user and relation layer, and perform integrated learning through Bayesian inference;

2)通过知识图谱构建关系集合、复合告警事件集合；2) Build a relationship set and a composite alarm event set through the knowledge graph;

3)通过用户典型行为画像构建典型行为标签、对单事件检测结果划分和标注群体事件集合。3) Construct typical behavior labels through typical user behavior portraits, divide single-event detection results, and label group event sets.

最后，基于模型的历史样本检测准确率建模贝叶斯网络模型，推理计算新样本的各分类联合检测概率，并不断用结果更新深度学习模型。Finally, the Bayesian network model is modeled based on the historical sample detection accuracy of the model, the joint detection probability of each classification of the new sample is calculated inference, and the deep learning model is continuously updated with the results.

步骤五、定时查询完成融合的检测结果数据集，处理PCAP和取证文件。Step 5: Regularly query the fused detection result data set, and process PCAP and forensic documents.

首先，定义不同安全级别事件保存周期，划分数据的存储要求。然后，根据需要长期保存的结果样本，对相关文件进行过滤、缩略、索引处理，形成缩略文件和样本索引。First, define the event storage period of different security levels and divide the storage requirements of data. Then, according to the result samples that need to be stored for a long time, the related files are filtered, abbreviated and indexed to form abbreviated files and sample indexes.

步骤六、将每日的流量统计指标、正常流量采样和检测结果批量转存大数据平台，分类存储，按定义的生命周期逐步清理不同安全级别事件数据。Step 6: Transfer the daily traffic statistics, normal traffic sampling and detection results to the big data platform in batches, store them in categories, and gradually clean up event data of different security levels according to the defined life cycle.

本发明的优点在于：The advantages of the present invention are:

1)、一种逐级并发缓存优化效能的智能网络感知方法，利用物联网和边缘计算技术，在用户侧提炼大数据和智能检测，实现了网络数据流处理的各层面并发和缓存方法，降低了能耗，提高系统性能，满足未来智慧家庭网络环境中的高效能安全态势感知系统的部署要求。1), an intelligent network perception method that optimizes the efficiency of concurrent caches by level, using the Internet of Things and edge computing technology to refine big data and intelligent detection on the user side, realize the concurrency and caching methods at all levels of network data stream processing, reduce It can reduce energy consumption, improve system performance, and meet the deployment requirements of high-performance security situational awareness systems in the future smart home network environment.

2)、一种逐级并发缓存优化效能的智能网络感知方法，全员联动特征的网络安全态势感知模型，下沉安全服务到各边缘网络，更贴近每类环境的具体用户需求；2), an intelligent network perception method that optimizes the efficiency of concurrent caches, and a network security situational awareness model with full linkage characteristics, sinking security services to each edge network, and is closer to the specific user needs of each type of environment;

3)、一种逐级并发缓存优化效能的智能网络感知方法，针对开放性通信系统互连参考模型定义的各层协议元数据，进行采集加工，逐层分析通信协议、统计特征，实验建立特征与常见攻击类型关系；提出高效能硬件实现方案和软件扩展机制，实现了网络数据流处理的各层面并发和缓存方法；3), an intelligent network perception method for optimizing the efficiency of concurrent caches layer by layer, collecting and processing the protocol metadata of each layer defined by the interconnection reference model of the open communication system, analyzing the communication protocol and statistical characteristics layer by layer, and establishing the characteristics experimentally Relationship with common attack types; propose a high-performance hardware implementation scheme and software extension mechanism, and implement concurrency and caching methods at all levels of network data stream processing;

4)、一种逐级并发缓存优化效能的智能网络感知方法，提出了三通道子系统并发提取各层特征和选择匹配的模型进行检测；通过多源、多模迭代融合检测结果，提高了检测的准确性、结果的可解释性和环境变化的适应能力。4), an intelligent network perception method for level-by-level concurrent cache optimization efficiency, proposes a three-channel subsystem to concurrently extract features of each layer and select a matching model for detection; iterative fusion of detection results through multi-source and multi-modality improves detection. accuracy, interpretability of results, and adaptability to environmental changes.

附图说明Description of drawings

图1是本发明一种逐级并发缓存优化效能的智能网络感知方法的流程图；Fig. 1 is a kind of flow chart of the intelligent network sensing method of the present invention that optimizes the efficiency of a kind of concurrent cache;

图2是本发明实现的逐级并发的网络流量采集、检测、存储系统结构图。FIG. 2 is a structural diagram of a hierarchically concurrent network traffic collection, detection, and storage system implemented by the present invention.

图3是本发明基于5G边缘计算网关技术的物理设备功能图。FIG. 3 is a functional diagram of a physical device based on the 5G edge computing gateway technology of the present invention.

图4是本发明具体轻量化采集、检测、分析和存储的处理流程图。FIG. 4 is a process flow chart of the specific lightweight collection, detection, analysis and storage of the present invention.

图5是本发明多源、多模检测结果的迭代融合流程图。FIG. 5 is a flow chart of iterative fusion of multi-source and multi-mode detection results of the present invention.

具体实施方式Detailed ways

下面通过附图和实施例，对本发明进行详细阐述。The present invention will be described in detail below through the accompanying drawings and embodiments.

本发明提出了一种针对智慧家庭的AIoT网络环境，基于边缘计算技术的轻量级网络流量采集和网络安全态势智能分析的解决方案，面向物联网的高效能要求，逐级并发划分，小型化数据采集、检测和存储，解决了网络异常事件的检测、融合分级和响应的问题。The present invention proposes an AIoT network environment for smart homes, a solution for light-weight network traffic collection and network security situation intelligent analysis based on edge computing technology, oriented to the high-performance requirements of the Internet of Things, concurrent division and miniaturization. Data collection, detection and storage solve the problems of detection, fusion classification and response of abnormal network events.

所述的网络流量采集、检测和存储方法，如图1所示，具体步骤如下：The network traffic collection, detection and storage method is shown in Figure 1, and the specific steps are as follows:

步骤一、在SoC为基础的嵌入式计算平台上，将单个网卡核和单个CPU核配对，形成独立分区，每个分区中通过集成高速高容量内存分别构建若干存储资源SPM(Scratch PadMemory)块，直接使用内存地址进行访问；由此构成了家庭互联网通信网关设备。Step 1. On the SoC-based embedded computing platform, pair a single network card core and a single CPU core to form independent partitions. In each partition, several storage resource SPM (Scratch PadMemory) blocks are constructed by integrating high-speed and high-capacity memory. Directly using the memory address for access; thus constitutes a home Internet communication gateway device.

每个网卡核各自配有一个CPU核；SPM以块的形式组织数据，每个块存储2M的数据，使用2K到20K个SPM块构建出大规模的高速缓存分区，适配不同类型网卡包处理速率要求；Each NIC core is equipped with a CPU core; SPM organizes data in the form of blocks, each block stores 2M data, and uses 2K to 20K SPM blocks to build a large-scale cache partition to adapt to different types of NIC packet processing speed requirements;

每个SPM块内直接存储网卡和CPU处理的数据，后续的数据处理，通过内存地址直接访问每个SPM块上存储的数据，通过起始地址和负荷数据长度，进行各块之间的连续访问，无需通过数据总线传递。The data processed by the network card and CPU is directly stored in each SPM block. For subsequent data processing, the data stored on each SPM block is directly accessed through the memory address, and the continuous access between the blocks is performed through the starting address and the length of the load data. , without passing over the data bus.

建立SPM的归属计算核专用寻址访问通道，映射成大页内存，作为网卡缓存和用户态缓存使用，片上网卡内核只需要一次采集和复制数据进入SPM区域，后续处理过程可以结合NUMA感知和大页内存机制透明使用SPM内存，实现零数据复制，减少复制操作引起的延迟和功耗。Establish a dedicated addressing access channel for the attributable computing core of SPM, map it into large page memory, and use it as a network card cache and user mode cache. The on-chip network card core only needs to collect and copy data into the SPM area once, and the subsequent processing can be combined with NUMA awareness and large page memory. The page memory mechanism transparently uses SPM memory to achieve zero data copying, reducing latency and power consumption caused by copying operations.

步骤二、将家庭互联网通信网关设备与家中的各上网设备分别连接，每个网卡并行采集不同上网设备的流量，并基于DPDK技术进行用户态多线程并发处理；Step 2: Connect the home Internet communication gateway device to each Internet access device in the home respectively, each network card collects the traffic of different Internet access devices in parallel, and performs user-mode multi-thread concurrent processing based on the DPDK technology;

基于DPDK技术实现多个物理网卡的分区域两级无锁并发采集流量，具体过程为：Based on the DPDK technology, multiple physical network cards are implemented in sub-regional and two-level lock-free concurrent traffic collection. The specific process is as follows:

每个进程和处理子线程绑定对应的CPU核，进行处理，减少进程和线程调度开销。Each process and processing sub-threads are bound to corresponding CPU cores for processing, reducing process and thread scheduling overhead.

Hash算法通过性能对比，选择了最节约CPU计算的算法完成。通过两级无锁并发和用户态协议栈进行数据包接收，10G以上高速流量可分解成数十个并行流程处理；充分利用通用多核处理器的能力；The Hash algorithm is completed by comparing the performance with the algorithm that saves the most CPU calculation. Data packet reception is performed through two-level lock-free concurrency and user mode protocol stack, and high-speed traffic above 10G can be decomposed into dozens of parallel processes for processing; fully utilize the capabilities of general-purpose multi-core processors;

取模后获得的MID，可以作为后续深度包检测分区、时序数据库的传感器ID分区、PCAP文件流ID检索、大数据平台的预分区ID，支撑各环节分区的分流、并发处理。The MID obtained after taking the modulo can be used as the subsequent deep packet detection partition, the sensor ID partition of the time series database, the PCAP file stream ID retrieval, and the pre-partition ID of the big data platform, supporting the shunting and concurrent processing of each link partition.

各个子进程对流量包完成IP协议数据字段提取、过滤和缓存处理后，通过三种通道分发给不同能力的子系统，进行并行检测和融合网络行为异常的记录；三通道是指计算密集型的应用层异常规则检测子系统、内存耗用型的网络层流量特征和异常模型检测子系统、高容量存储型的PCAP和取证文件检测子系统；After each sub-process completes the IP protocol data field extraction, filtering and buffering of the traffic packets, it is distributed to subsystems with different capabilities through three channels for parallel detection and integration of abnormal network behavior records; three channels refer to computationally intensive Application layer anomaly rule detection subsystem, memory-consuming network layer traffic characteristics and anomaly model detection subsystem, high-capacity storage type PCAP and forensic file detection subsystem;

通过提取、过滤、缓存浓缩一批数据，异步转发多个模型，并行检测和融合网络行为异常的记录；具体步骤如下：Concentrate a batch of data by extracting, filtering, and caching, forward multiple models asynchronously, and detect and fuse records of abnormal network behavior in parallel; the specific steps are as follows:

步骤301、将处理后的流量包按IP隧道方式分发到应用层异常规则检测子系统，实时提取定义规则的元数据和应用层协议元数据，进行初步的基于逻辑表达式计算实现的规则匹配式检测；Step 301: Distribute the processed traffic packets to the application layer anomaly rule detection subsystem in an IP tunnel mode, extract the metadata defining the rule and the application layer protocol metadata in real time, and perform a preliminary rule matching formula based on logic expression calculation. detection;

规则匹配式检测为了发现端口扫描、DDoS或暴力破解等类型具备明显网络流量和协议字段特征的异常行为。Rule-matching detection is designed to detect abnormal behaviors with obvious characteristics of network traffic and protocol fields, such as port scanning, DDoS or brute force cracking.

步骤302、提取的应用层协议元数据通过分布式消息队列转发到机器学习模型中，利用网络层流量特征和异常模型检测子系统进行准实时深度检测；Step 302, the extracted application layer protocol metadata is forwarded to the machine learning model through the distributed message queue, and the quasi-real-time depth detection is performed by using the network layer traffic characteristics and the abnormal model detection subsystem;

具体为：针对恶意代码、病毒和木马、应用层漏洞利用等类型在应用层携带和触发恶意行为执行的网络流量，建立特征字段扫描、状态机推理、机器学习模型分类、异常语义和异常逻辑流程等多个分析模型，从多个维度标签网络流量的应用行为类型；Specifically, for network traffic carried by malicious codes, viruses and Trojan horses, and application layer vulnerability exploits in the application layer and triggering malicious behavior execution, establish feature field scanning, state machine reasoning, machine learning model classification, abnormal semantics and abnormal logic flow and other analysis models to label the application behavior types of network traffic from multiple dimensions;

步骤303、同时，将提取的IP协议数据字段进行流加工，把加工好的流特征和时间、主机、服务等相关性统计指标以及5元组，定时存储到时序数据库；通过分布式消息队列转发到机器学习模型，利用网络层流量特征和异常模型检测子系统进行长周期的网络行为异常检测和分类检测；Step 303: At the same time, perform stream processing on the extracted IP protocol data fields, and store the processed stream characteristics, time, host, service and other related statistical indicators as well as 5-tuples in a time series database; forwarding through distributed message queues To the machine learning model, use the network layer traffic characteristics and anomaly model detection subsystem to perform long-term network behavior anomaly detection and classification detection;

统计特征是多个分钟、小时、天、周、月、年的长周期指标，数据包的长度、频率等统计特征字段的均值、标准差、信息熵、随机过程的概率密度分布的线性组合；Statistical features are long-period indicators of multiple minutes, hours, days, weeks, months, and years, and the linear combination of the mean, standard deviation, information entropy, and probability density distribution of random processes of statistical feature fields such as the length and frequency of data packets;

步骤304、通过本地创建的虚拟网卡，落地PCAP备份文件，根据订阅规则扫描PCAP文件并取证网络传输的资源文件，通过分布式消息队列转发到PCAP和取证文件检测子系统进行病毒检测；Step 304, through the locally created virtual network card, land the PCAP backup file, scan the PCAP file according to the subscription rule and obtain the resource file transmitted by the network, and forward it to the PCAP and the forensic file detection subsystem through the distributed message queue for virus detection;

使用PCAP文件存储网络原始报文，提取出各种类型文件后，通过分布式消息队列分发给PCAP和取证文件检测子系统；同时建立用户发送接收数据资源文件的分类历史档案存证，方便查询、取证和溯源，发现异常的用户数据资源文件泄露行为。Use PCAP files to store original network messages, extract various types of files, and distribute them to PCAP and forensic file detection subsystems through distributed message queues; at the same time, establish a classified historical archive of data resource files sent and received by users to store evidence, which is convenient for query, Forensics and source tracing, to discover abnormal leaks of user data resource files.

步骤四、将三通道输出的各个检测结果实现多源、多模检测结果的迭代融合；Step 4: Iterative fusion of multi-source and multi-mode detection results is realized for each detection result output by the three channels;

具体迭代融合过程为：The specific iterative fusion process is as follows:

步骤401、建立自学习、自适应的孪生环境，可持续提取环境中的典型数据包、典型统计特征，迭代训练、增强模型的检测能力；Step 401, establishing a self-learning and self-adaptive twin environment, continuously extracting typical data packets and typical statistical features in the environment, iterative training, and enhancing the detection capability of the model;

在自适应环境中，通过自动周期滑动窗口方式，追加实际环境产生的每种分析主题的典型样本，使用诸如聚类等相似度算法、规则和模型迁移等压缩学习算法、对抗生成等激励算法提取典型数据包；并通过降维，向量化、稀疏和重构等方法进行样本和特征提取；辅助建立攻击、防守、蜜罐节点，主动生产和标签实际环境中的真实样本；基于深度学习的GAN对抗生成网络模型，训练检测和生成深度学习模型，多阶段迭代对抗、生产强化样本。划分训练集和验证集，逐步完善测试集，构建模型的每日自学习能力。In the adaptive environment, through the automatic periodic sliding window method, the typical samples of each analysis topic generated in the actual environment are added, and extracted using similarity algorithms such as clustering, compression learning algorithms such as rule and model transfer, and incentive algorithms such as confrontation generation. Typical data packets; and sample and feature extraction through dimensionality reduction, vectorization, sparse and reconstruction methods; assist in the establishment of attack, defense, honeypot nodes, actively produce and label real samples in the actual environment; GAN based on deep learning Adversarial generative network models, training detection and generative deep learning models, multi-stage iterative confrontation, and production of enhanced samples. Divide the training set and the validation set, gradually improve the test set, and build the daily self-learning ability of the model.

步骤402、在新构建的孪生环境中，针对不同来源、特征的数据，排序发现步骤401训练出的多个最优深度学习检测模型，然后融合这些检测模型的检测结果，通过贝叶斯推理进行集成学习、刻划出复合告警事件集合，对单事件检测结果进行群体事件集合的标签划分。Step 402: In the newly constructed twin environment, for data from different sources and features, sort and discover multiple optimal deep learning detection models trained in step 401, and then fuse the detection results of these detection models to perform Bayesian inference. Integrate learning, characterize the composite alarm event set, and perform label division of the group event set on the single event detection result.

根据朴素贝叶斯推断公式，可以计算每个类别的各模型输出概率和类别的后验概率的乘积，取概率最大的类别作为判别结果；According to the Naive Bayes inference formula, the product of the output probability of each model of each category and the posterior probability of the category can be calculated, and the category with the highest probability is taken as the discrimination result;

并进一步推理出高斯过程概率参数估计和典型样本采样循环迭代式的集成学习。通过知识图谱构建关系集合、刻划复合告警事件集合。通过用户典型行为画像构建典型行为标签、对单事件检测结果进行群体事件集合划分和标注。把多个告警，通过专家知识建模的关联关系规则，或自动统计出的关联规则，进行合并，形成一个知识图谱形式的关系图。根据这个关系图，可以呈现出主题、核心事件、关系紧密程度、攻击过程等图结构特征，实现自动的层次化融合结果，减少需要关注和分析的细节。And further inferred the ensemble learning of Gaussian process probability parameter estimation and typical sample sampling loop iteration. Build a relationship set and characterize a composite alarm event set through the knowledge graph. The typical behavior label is constructed through the user's typical behavior portrait, and the group event collection is divided and labeled for the single-event detection result. Combine multiple alarms through association rules modeled by expert knowledge or association rules automatically counted to form a relationship graph in the form of a knowledge graph. According to this relationship graph, graph structure features such as themes, core events, closeness of relationship, and attack process can be presented to achieve automatic hierarchical fusion results and reduce the details that need attention and analysis.

假设有n个不同类型模型，集成学习识别测试样本X，需要先构建m个样本的集成模型训练集，随着m的数量不断增加，集成模型的准确率不断会提升，但检测计算耗时固定。设共K个类别分别为C_k(k＝1,2,…,K)，训练集中每个类别的样本数目分别为m_k，类别C_k的先验概率为：Assuming that there are n different types of models, ensemble learning to identify the test sample X, it is necessary to build an ensemble model training set of m samples first. As the number of m continues to increase, the accuracy of the ensemble model will continue to improve, but the detection calculation time is fixed. . Let a total of K categories be C _k (k=1,2,...,K), the number of samples of each category in the training set is m _k , and the prior probability of category C _k is:

P(Y＝C_k)＝(m_k+λ)/(m+Kλ) (2)P(Y=C _k )=(m _k +λ)/(m+Kλ) (2)

λ为0到1之间的折扣因子，λ is a discount factor between 0 and 1,

定义模型A在训练集中识别类别C_k的准确性为条件概率：Define the accuracy of model A in identifying class C _k in the training set as the conditional probability:

对某一实例X，每个模型n_j运行一次检测，获得预测X为类别C_k的相关模型的概率P(n_j(X)|Y＝C_k)，计算类别C_k的先验概率、这些投票给C_k的模型的条件概率的乘积和模型预测X的类别概率三者乘积，最大后验概率对应的类别作为X的分类C：For an instance X, run a test for each model n _j , obtain the probability P(n _j (X)|Y=C _k ) that predicts X to be a related model of category C _k , and calculate the prior probability of category C _k , The product of the conditional probabilities of these models that vote for C _k and the class probabilities of the model predicting X, the class corresponding to the maximum posterior probability is taken as the class C of X:

根据杀伤链各阶段的活动特征和顺序关系，推理出各检测结果事件所归属的阶段。建立完整的活动流程后，再通过关联分析和特征迭代聚类，划分出不同的群体事件集合。According to the activity characteristics and sequence relationship of each stage of the kill chain, the stage to which each detection result event belongs is deduced. After the complete activity process is established, different group event sets are divided through association analysis and feature iterative clustering.

步骤403、针对长时间周期、不同时序周期范围的数据，排序发现的最优检测模型集，聚类减少分类标签数，按指数等比缩小远期历史模型的权值进行时序式指数移动平均加权融合学习。Step 403 , according to the data of long time period and different time series period range, sort the found optimal detection model set, reduce the number of classification labels by clustering, reduce the weight of the long-term historical model according to the exponential ratio, and perform time series exponential moving average weighting. Integrated learning.

基于近期数据训练的模型更贴合当前环境特点的规律，不断提升系统对突发异常的检测能力，以及对环境变化的及时适应调整能力；The model trained based on recent data is more in line with the laws of the current environmental characteristics, and continuously improves the system's ability to detect sudden anomalies, as well as the ability to adapt to environmental changes in a timely manner;

网络空间中攻击和安全问题层出不穷，有很强的周期迭代发展特点，相关模型和样本需要与时俱进，及时发现未知攻击和提高新攻击的识别准确性。按时序划分周、月、季、年、3年等，划分样本训练多个周期模型mp，可以兼顾保鲜性和全面性，减少冲突。同时通过历史的不同周期时间段积累，可迭代训练，挑选出对应时间段的最优模型mpm。在实时检测过程，使用如下算法由近到远加权融合每种模型的决策的概率值结果，指数衰减远期模型，进行快速决策：Attacks and security problems emerge in an endless stream in cyberspace, and there are strong periodic iterative development characteristics. Relevant models and samples need to keep pace with the times to detect unknown attacks in time and improve the identification accuracy of new attacks. Divide weeks, months, quarters, years, 3 years, etc. according to the time series, and divide the samples to train multiple periodic models mp, which can take into account the preservation and comprehensiveness, and reduce conflicts. At the same time, through the accumulation of different periods of history, iterative training can be used to select the optimal model mpm for the corresponding time period. In the real-time detection process, the following algorithm is used to weight and fuse the probability value results of each model's decision from near to far, and exponentially decay the long-term model to make fast decisions:

在事后全面检测过程中，也可综合使用步骤402的贝叶斯推理方法，建立覆盖广泛的集成模型训练样本集，融合所有按时序划分训练的模型mp，做出更准确判断。In the post-event comprehensive detection process, the Bayesian inference method in step 402 can also be used comprehensively to establish a comprehensive model training sample set covering a wide range, and fuse all the models mp trained according to time series to make more accurate judgments.

具体融合功能按三个主题进行，包括：The specific integration functions are carried out according to three themes, including:

1)通过贝叶斯推理进行集成学习；1) Ensemble learning through Bayesian inference;

按时间维度加权集成，如应用蒙托卡罗法和聚类法，动态融合减少分类标签，调整模型的适应度权值，逐渐对新环境的自学习适应。按多分类器集成方式联合推理，如正常异常分类，可以先用统计模型准确获得异常样本，再用深度学习模型发现潜在的未知异常，最后用白名单规则过滤异常样本降低误报率。最终应用贝叶斯推理技术，基于模型的历史样本检测准确率建模贝叶斯网络模型，推理计算新样本的各分类联合检测概率，并不断用结果更新网络模型。Weighted integration according to the time dimension, such as the application of Monte Carlo method and clustering method, dynamic fusion to reduce classification labels, adjust the fitness weight of the model, and gradually adapt to the new environment by self-learning. Joint reasoning by multi-classifier integration, such as normal anomaly classification, can first use statistical models to accurately obtain abnormal samples, then use deep learning models to discover potential unknown anomalies, and finally use whitelist rules to filter abnormal samples to reduce the false positive rate. Finally, the Bayesian inference technology is applied, the Bayesian network model is modeled based on the historical sample detection accuracy of the model, the joint detection probability of each classification of the new sample is calculated by reasoning, and the network model is continuously updated with the results.

复合告警事件，是把多个告警，通过专家知识建模的关联关系规则，或自动统计出的关联规则，进行合并，形成一个知识图谱形式的关系图。根据这个关系图，可以呈现出主题、核心事件、关系紧密程度、攻击过程等图结构特征。自动的层次化融合结果，减少需要关注和分析的细节。A compound alarm event is a combination of multiple alarms, association rules modeled by expert knowledge, or association rules automatically counted to form a relationship graph in the form of a knowledge graph. According to this relationship graph, graph structure features such as themes, core events, closeness of relationship, and attack process can be presented. Automatically hierarchical fusion results, reducing the need for attention and analysis of the details.

用户画像是为了分析出用户行为标签，方便用户通过网络数据识别出自己的典型日常活动的规律，可以做频率、发出和接受数据量、持续时间等统计指标，形成概率化的每小时、每天的均值、标准差、信息熵、高斯过程等分布特征指标。当用户行为发生变化时，这些指标一定会产生较大波动，同时也可以观察出各种活动在时序上的发生概率、均值、标准差范围。User portraits are used to analyze user behavior labels, so as to facilitate users to identify their typical daily activities through network data. Statistics such as frequency, amount of data sent and received, and duration can be made to form probabilistic hourly and daily data. Mean, standard deviation, information entropy, Gaussian process and other distribution characteristic indicators. When user behavior changes, these indicators will inevitably fluctuate greatly, and at the same time, the probability, mean, and standard deviation of various activities in time series can be observed.

首先，定义不同安全级别事件保存周期，划分数据的存储要求。然后，根据需要长期保存的结果样本，对相关文件进行过滤、缩略、索引处理，形成缩略文件和样本索引；减少存储文件的大小，加快历史数据检索的效率；First, define the event storage period of different security levels and divide the storage requirements of data. Then, according to the result samples that need to be stored for a long time, the relevant files are filtered, abbreviated, and indexed to form abbreviated files and sample indexes; reduce the size of the stored files and speed up the efficiency of historical data retrieval;

流量统计指标、正常流量采样、检测结果等数据，每日批量转存大数据平台，缩略后的文件，每日批量上传文件到大数据平台。Data such as traffic statistical indicators, normal traffic sampling, and test results are transferred to the big data platform in batches every day, and the abbreviated files are uploaded to the big data platform in batches every day.

数据和文件建立关联关系，提取上下文信息标签数据，丰富主题和关系信息。形成的历史数据成果，结合自动化训练过程，可以不断提升模型检测能力。分类存储，按定义的生命周期逐步清理不同安全级别事件数据。Data and documents establish associations, extract contextual information and label data, and enrich topic and relationship information. The formed historical data results, combined with the automated training process, can continuously improve the model detection ability. Categorized storage, and gradually clean up event data of different security levels according to the defined life cycle.

实施例：Example:

本实例提供的系统结构原理示例，如图2所示，首先是搭建家庭互联网通信网关设备，通过镜像配置可获得互联网出口上的所有流量、家庭内部多个子网段的流量；然后是镜像网络流量采集处理模块，通过在SoC为基础的嵌入式计算平台上，集成高速高容量内存和网卡内核，构建大容量片上存储资源SPM，缓存镜像获得的流量。基于DPDK技术实现多个物理网卡的分区域并发采集流量。通过两级无锁并发和用户态协议栈进行数据包接收和分发；接着，各协议字段解析和并发检测处理模块，完成IP协议数据字段提取、过滤、缓存处理后，通过三种通道分发给不同能力子系统，进行检测处理；最后，本地化的样本库和模型效能分析模块，融合各类检测的结果，保存异常样本、采样正常样本，关联丰富上下文信息。通过攻防环境模拟、深度学习对抗生成，丰富样本。训练模型和评估模型效能，持续改进和集成优秀模型；同时，基于云平台的样本和文件缩略和索引存储模块，发现的需长期存储样本和取证文件，每日上传云平台的数据库和分布式存储中，按存储时长分类存放和自动清理。An example of the system structure principle provided in this example is shown in Figure 2. First, a home Internet communication gateway device is built. Through mirroring configuration, all traffic on the Internet egress and traffic on multiple subnet segments within the home can be obtained; then mirror network traffic The acquisition and processing module integrates high-speed and high-capacity memory and network card cores on the SoC-based embedded computing platform, constructs a large-capacity on-chip storage resource SPM, and caches the traffic obtained by mirroring. Based on DPDK technology, multiple physical network cards can collect traffic concurrently in sub-regions. Data packets are received and distributed through two-level lock-free concurrency and user-mode protocol stacks; then, each protocol field analysis and concurrency detection processing module completes IP protocol data field extraction, filtering, and cache processing, and distributes them to different The capability subsystem performs detection processing; finally, the localized sample library and model performance analysis module integrates the results of various detections, saves abnormal samples, samples normal samples, and associates rich contextual information. Through the simulation of attack and defense environment and the generation of deep learning confrontation, the samples are enriched. Train models and evaluate model performance, and continuously improve and integrate excellent models; at the same time, based on the cloud platform's sample and file abbreviated and indexed storage modules, the discovered samples and forensic files need to be stored for a long time, and uploaded to the cloud platform's database and distributed data every day. During storage, it is classified according to the storage duration and automatically cleaned up.

本实例提供的基于5G边缘计算网关技术的物理设备功能图，如图3所示，通过部署在智慧家庭中的高性能5G通信网关，镜像互联网流量和内部家庭网络流量，有多组agent并发采集进程分发到三种计算和检测模型中，形成多层、多模的特征数据集，通过模型的迭代优化层不断训练和集成多模型，形成联合推理能力。The physical device function diagram based on 5G edge computing gateway technology provided in this example is shown in Figure 3. Through the high-performance 5G communication gateway deployed in the smart home, the Internet traffic and internal home network traffic are mirrored, and multiple groups of agents collect concurrently. The process is distributed to three computing and detection models, forming a multi-layer, multi-modal feature data set, and continuously training and integrating multiple models through the iterative optimization layer of the model to form a joint reasoning capability.

本实施例进行网络流量采集、检测、分析和存储的流程如图4所示，具体步骤如下：The flow of network traffic collection, detection, analysis, and storage in this embodiment is shown in Figure 4, and the specific steps are as follows:

步骤301：嵌入式计算平台使用片上存储资源；Step 301: the embedded computing platform uses on-chip storage resources;

在SoC平台上，集成高速高容量内存和网卡内核，提供出大容量的片上存储资源SPM。网络流量有片上网卡内核控制缓存处理，一次复制数据进入SPM区域。用户态的读取网络包进程，可绑定到对应区域的CPU上，使用大页内存机制透明访问SPM区域。On the SoC platform, high-speed and high-capacity memory and network card core are integrated to provide a large-capacity on-chip storage resource SPM. The network traffic is processed by the on-chip network card kernel to control the cache, and the data is copied into the SPM area at a time. The process of reading network packets in user mode can be bound to the CPU of the corresponding area and transparently access the SPM area using the huge page memory mechanism.

步骤302：两级并发无锁采集流量；Step 302: Two-level concurrent lock-free collection of traffic;

用户态进程基于DPDK技术，分别绑定多个不同网段的镜像网卡，分区域并发采集流量。按网络包5元组的排序后字符串hash计算构建唯一并发UFID，根据并发通道数C，取模后选择归属队列Qi，通过队列分发给处理子进程。The user mode process is based on DPDK technology, and binds multiple mirrored network cards of different network segments, and collects traffic concurrently in different regions. The unique concurrent UFID is constructed by string hash calculation after sorting the 5-tuple of network packets. According to the number of concurrent channels C, the belonging queue Qi is selected after modulo, and distributed to the processing sub-processes through the queue.

步骤303：三通道并发检测；Step 303: three-channel concurrent detection;

子进程完成IP协议数据字段提取、过滤、缓存处理后，通过三种通道分发给下列不同能力子系统，进行检测处理。After the subprocess completes IP protocol data field extraction, filtering, and cache processing, it is distributed to the following subsystems with different capabilities through three channels for detection processing.

1、IP隧道方式分发流量到应用层异常规则检测子系统。子系统提取定义规则的元数据和应用层协议元数据，进行初步的规则匹配式检测，获得应用层数据后，再通过分布式消息队列转发机器学习模型进行进一步的深度检测和用户行为关系分析；1. The IP tunnel mode distributes traffic to the application layer exception rule detection subsystem. The subsystem extracts the metadata that defines the rules and the metadata of the application layer protocol, performs preliminary rule matching detection, and after obtaining the application layer data, forwards the machine learning model through the distributed message queue for further in-depth detection and user behavior relationship analysis;

2、网络层流量特征和异常模型检测子系统。提取的IP协议数据字段，直接存储内部网络中的时序数据库，进行流加工和多种指标统计。加工好的流特征和时间、主机、服务等相关性统计指标，通过分布式消息队列转发机器学习模型进行异常检测和攻击分类检测；2. Network layer traffic characteristics and anomaly model detection subsystem. The extracted IP protocol data fields are directly stored in the time series database in the internal network for stream processing and various indicator statistics. The processed flow characteristics and correlation statistical indicators such as time, host, service, etc., are used for anomaly detection and attack classification detection through the distributed message queue forwarding machine learning model;

3、PCAP和取证文件检测子系统。通过本地创建的虚拟网卡，落地PCAP备份文件，根据订阅规则扫描PCAP文件取证网络传输的资源文件，通过分布式消息队列转发文件检测模型进行病毒检测。3. PCAP and forensic document detection subsystem. Through the locally created virtual network card, the PCAP backup file is landed, and the PCAP file is scanned according to the subscription rules for forensic network transmission of resource files, and virus detection is performed through the distributed message queue forwarding file detection model.

步骤304：多源、多模迭代融合；Step 304: Multi-source and multi-mode iterative fusion;

新样本的自动收集和标签，自适应环境，自动周期滑动窗口方式，追加环境产生的每种分析主题的典型样本。辅助建立攻击、防守、蜜罐节点，主动生产和标签实际环境中的真实样本。训练检测和生成两种深度学习模型，多阶段迭代对抗、生产强化样本。划分训练集和验证集，逐步完善测试集。Automatic collection and labeling of new samples, adaptive environment, automatic periodic sliding window approach, append environment generated typical samples for each analysis subject. Assist in the establishment of attack, defense, and honeypot nodes, and actively produce and label real samples in the actual environment. Training detection and generation of two deep learning models, multi-stage iterative confrontation, and production of enhanced samples. Divide the training set and the validation set, and gradually improve the test set.

不同来源、特征的数据，排序发现最优检测模型。不同时序周期范围的数据，排序发现不同时序周期范围的最优检测模型集。在实时检测过程，使用时序指数移动权值进行融合学习。在事后全面检测过程中，使用贝叶斯推理进行集成学习。Sort data from different sources and characteristics to find the optimal detection model. The data in different time series ranges are sorted to find the optimal detection model sets in different time series ranges. In the real-time detection process, ensemble learning is performed using time-series exponential moving weights. Ensemble learning using Bayesian inference during post-hoc comprehensive detection.

步骤305：结果数据缩略和关联索引；Step 305: result data abbreviated and associated index;

定时查询完成融合的检测结果数据集，开启处理PCAP和取证文件过程。由需要长期保存的结果样本进行过滤、缩略、索引处理，形成缩略文件和样本索引。The fused detection result dataset is queried regularly, and the process of processing PCAP and forensics files is started. Filter, abbreviate, and index the result samples that need to be stored for a long time to form abbreviated files and sample indexes.

步骤306：每日异步批量转存、复用和分类存储数据；Step 306: dump, multiplex, and classify storage data in batches asynchronously every day;

流量统计指标、正常流量采样、检测结果等数据，每日批量转存大数据平台。缩略后的文件，每日批量上传文件到大数据平台。按保存周期要求，分类存储，按定义的生命周期逐步清理不同安全级别事件数据。Traffic statistics indicators, normal traffic sampling, test results and other data are transferred to the big data platform in batches every day. The abbreviated files are uploaded in batches to the big data platform every day. According to the requirements of the preservation period, it is classified and stored, and the event data of different security levels is gradually cleaned up according to the defined life cycle.

本实施例进行多源、多模检测结果的迭代融合，不断自适应、更新和丰富样本集的流程如图5所示，如下步骤：This embodiment performs iterative fusion of multi-source and multi-mode detection results, and the process of continuously adapting, updating and enriching the sample set is shown in Figure 5, and the steps are as follows:

步骤401：建立新样本自动收集和标签机制；Step 401: establish an automatic collection and labeling mechanism for new samples;

定期采集实际环境产生的每种分析主题的典型样本。通过建立攻击、防守、蜜罐节点，主动生产和标签实际环境中的真实样本。通过训练检测和生成两种深度学习模型，多阶段迭代对抗、生产强化样本。A typical sample of each analytical subject produced in the actual environment is collected periodically. Actively produce and label real samples in real environments by establishing attack, defense, and honeypot nodes. Through training detection and generation of two deep learning models, multi-stage iterative confrontation and production of enhanced samples.

步骤402：分析和提取典型样本；Step 402: analyze and extract typical samples;

去重和模糊化处理。聚类分析，提取与标签匹配的类中心样本，发现异常点。树模型的分类分析，PCA主成分分析，获得关键特征，通过关键特征聚类相似样本。Deduplication and blurring. Cluster analysis, extracting class center samples matching the labels, and finding outliers. Classification analysis of tree model, PCA principal component analysis, key features are obtained, and similar samples are clustered by key features.

步骤403：划分时序周期数据集合提取特征；Step 403: Divide the time series periodic data set to extract features;

按周、月、季、年、3年，建立新的时序周期数据集合，针对提取的样本数据运行特征加工程序，形成特征数据集。Create a new time series periodic data set by week, month, quarter, year, and 3 years, and run the feature processing program on the extracted sample data to form a feature data set.

步骤404：提取抽取70％数据作为训练集和验证集。Step 404: Extract 70% of the data as a training set and a validation set.

步骤405：多个机器学习模型进行训练，保存训练完成的模型参数。Step 405: Train multiple machine learning models, and save the model parameters after training.

步骤406：提取30％典型测试数据集；Step 406: extract 30% typical test data set;

步骤407：针对每个时序周期获得的模型，通过测试数据集进行评估，按准确率高和耗时低的标准，排序选择模型。Step 407: For the models obtained in each time sequence period, evaluate through the test data set, and sort and select the models according to the criteria of high accuracy and low time consumption.

步骤408：为获得网络协议相匹配的信息，划分网络层、应用层、用户和关系三层，分别提取协议关键要素、统计，加工特征样本。Step 408: In order to obtain the matching information of the network protocol, the network layer, the application layer, the user layer and the relationship layer are divided into three layers, and the key elements of the protocol, statistics, and feature samples are extracted respectively.

步骤409：利用网络层样本，使用之前评估的最优模型进行训练。Step 409: Use the network layer samples to perform training using the previously evaluated optimal model.

步骤410：利用应用层样本，使用之前评估的最优模型进行训练。Step 410: Using the application layer samples, use the previously evaluated optimal model for training.

步骤411：利用用户和关系层样本，使用之前评估的最优模型进行训练。Step 411: Use the user and relationship layer samples to train using the previously evaluated optimal model.

步骤412：多模时序指数移动权值集成；Step 412: Multi-mode time series exponential moving weight integration;

在实时检测过程，融合评估的每个时序范围最优模型，由近到远加权，指数衰减远期模型。In the real-time detection process, the optimal model for each time series range evaluated is fused, weighted from near to far, and the long-term model is exponentially decayed.

步骤413：多源模型贝叶斯推理集成；Step 413: Multi-source model Bayesian inference integration;

在事后全面检测过程中，集成409、410、411步骤获得的多源模型，使用贝叶斯推理进行集成学习。In the post-event comprehensive detection process, the multi-source models obtained in steps 409, 410, and 411 are integrated, and Bayesian inference is used for ensemble learning.

步骤414：评估上述步骤的各个最优子模型，并与多模时序指数移动权值集成模型、多源模型贝叶斯推理集成模型进行对比。Step 414: Evaluate each optimal sub-model of the above steps, and compare with the multi-mode time series exponential moving weight integration model and the multi-source model Bayesian inference integration model.

步骤415：每个时序阶段末，完成上述样本集迭代和新模型训练后，针对新建立的训练数据集、测试数据集，提取准确率提升后的模型生效环境。进入下一轮提取数据、新建模型、新老评估对比过程。Step 415 : At the end of each time sequence stage, after completing the above-mentioned sample set iteration and new model training, extract the model validation environment with improved accuracy for the newly established training data set and test data set. Enter the next round of data extraction, new models, and new and old evaluation and comparison processes.

本发明公开了一种在SoC为基础的嵌入式计算平台上，集成高速高容量内存和网卡内核，构建大容量片上存储资源SPM的方法。片上网卡内核只需要一次复制数据进入SPM区域，结合NUMA感知和大页内存机制透明使用SPM内存；一种基于DPDK技术实现多个物理网卡的分区域并发采集流量的技术。通过两级无锁并发和用户态协议栈进行数据包接收，10G以上高速流量可分解成数十个并行流程处理；一种根据IP协议数据字段提取、过滤、缓存处理的策略，适配检测分类的资源要求，通过三种通道分发给不同能力子系统，进行检测处理。通过提取、过滤、缓存浓缩一批数据，异步转发多个模型，并行检测和融合异常记录；一种多源、多模检测结果的迭代融合算法，自适应环境，自动周期滑动窗口方式，追加环境产生的每种分析主题的典型样本，训练检测和生成两类深度学习模型，多阶段迭代对抗，排序发现不同时序周期范围的最优检测模型集，指数等比缩小远期历史模型的权值进行多模检测结果融合，不断提升系统检测能力；一种缩略和存储结果方法，查询完成融合的检测结果数据集，处理PCAP和取证文件。定义不同安全级别事件保存周期，由需要长期保存的结果样本进行过滤、缩略、索引处理，形成缩略文件和样本索引，减少存储量，提高检索效率；最后，流量统计指标、正常流量采样、检测结果等数据，每日批量转存大数据平台，缩略后的文件，每日批量上传文件到大数据平台。分类存储，按定义的生命周期逐步清理不同安全级别事件数据。本发明，针对6G全连接和高带宽背景下的网络流量采集和分析场景，提出新的高效能硬件实现方案和软件扩展机制，实现了网络数据流处理的各层面并发和缓存方法，降低能耗，提高系统性能，满足未来智慧家庭网络环境中的高效能安全态势感知系统部署要求。The invention discloses a method for building a large-capacity on-chip storage resource SPM by integrating a high-speed and high-capacity memory and a network card kernel on an SoC-based embedded computing platform. The on-chip NIC core only needs to copy data into the SPM area once, and use the SPM memory transparently by combining NUMA awareness and huge page memory mechanism; a technology based on DPDK technology to realize concurrent collection of traffic in sub-regions of multiple physical NICs. Data packet reception is performed through two-level lock-free concurrency and user-mode protocol stack, and high-speed traffic above 10G can be decomposed into dozens of parallel processes for processing; a strategy based on IP protocol data field extraction, filtering, and cache processing, adapted to detection classification The resource requirements are distributed to different capability subsystems through three channels for detection processing. Concentrate a batch of data by extracting, filtering, and caching, asynchronously forward multiple models, and detect and fuse abnormal records in parallel; an iterative fusion algorithm for multi-source and multi-mode detection results, adaptive environment, automatic periodic sliding window method, additional environment The typical samples of each analysis topic generated, training detection and generation of two types of deep learning models, multi-stage iterative confrontation, sorting to find the optimal detection model set in different time series ranges, and exponentially reducing the weights of the long-term historical models. The fusion of multi-mode detection results continuously improves the detection capability of the system; a method of abbreviating and storing results, querying the fused detection result data set, and processing PCAP and forensic documents. Define the storage period of events of different security levels, filter, abbreviate, and index the result samples that need to be stored for a long time to form abbreviated files and sample indexes, reduce storage capacity, and improve retrieval efficiency; finally, traffic statistics indicators, normal traffic sampling, Data such as test results are transferred to the big data platform in batches every day, and the abbreviated files are uploaded to the big data platform in batches every day. Categorized storage, and gradually clean up event data of different security levels according to the defined life cycle. The present invention proposes a new high-performance hardware implementation scheme and software expansion mechanism for network traffic collection and analysis scenarios under the background of 6G full connection and high bandwidth, realizes concurrency and caching methods at all levels of network data stream processing, and reduces energy consumption , improve system performance, and meet the deployment requirements of high-efficiency security situational awareness systems in the future smart home network environment.

Claims

1. An intelligent network sensing method for optimizing efficiency of a progressive concurrent cache is characterized by specifically comprising the following steps of: on an embedded computing platform based on SoC, pairing a single network card core and a single CPU core to form independent partitions, respectively constructing a plurality of high-capacity on-chip storage resource SPM (scratch Pad memory) blocks in each partition through an integrated memory, and directly accessing by using memory addresses; thereby constituting a home internet communication gateway apparatus;

then, the home internet communication gateway device is respectively connected with each internet device in the home, each network card parallelly collects the flow of different internet devices, and user-mode multithreading concurrent processing is carried out based on a data Plane Development kit DPDK (data Plane Development kit) technology;

the specific process is as follows:

firstly, each network card is packed by a distribution main process, the packed packets are sequenced according to 5-tuple, and the unique concurrent UFID is constructed by the hash calculation of a character string, wherein the calculation formula is as follows:

UFID＝shash(sort(srcip,srcport,dstip,dstport),protocol)

wherein, srcip, srcport, dstip and dstport are respectively a source ip address, a source port, a destination ip and a destination port; ensuring that a bidirectional data packet is transmitted and received through sort processing, splicing the four fields into character strings in the same sequence, and calculating a hash id of the character strings by using a shock algorithm after adding a protocol type field;

then, according to the number C of the concurrent channels, the character strings of the bidirectional data packets of the same flow are subjected to hash calculation, and then respectively belong to corresponding Qi queues for storage, and are distributed to a processing sub-process through the queues;

the processed flow packets are parallelly distributed to a three-channel subsystem for detection according to IP protocol detection and routing configuration; carrying out iterative fusion on each detection result output by the three-channel subsystem;

the three-channel subsystem comprises an application layer abnormal rule detection subsystem, a network layer flow characteristic and abnormal model detection subsystem, a PCAP (personal computer application protocol) and a evidence obtaining file detection subsystem;

the three-channel subsystem parallel detection method comprises the following specific steps:

step 301, after completing extraction, filtering and caching of IP protocol data fields of each flow packet by each subprocess, distributing the flow packets to an application layer abnormal rule detection subsystem in an IP tunnel mode, extracting metadata defining rules and application layer protocol metadata, and performing rule matching detection through a computational logic expression;

step 302, forwarding the extracted application layer protocol metadata to a machine learning model through a distributed message queue, performing near-real-time deep detection by using a network layer traffic characteristic and abnormal model detection subsystem, and finding out a user behavior label with fuzzy semantics and similarity;

step 303, at the same time, directly storing the IP protocol data fields in the extracted metadata into a time sequence database, performing stream processing and multiple index statistics, forwarding the statistical indexes to a machine learning model through a distributed message queue, performing long-period network behavior anomaly detection and classification detection by using a network layer flow characteristic and anomaly model detection subsystem, and finding unknown anomalies represented by time sequence type statistical indexes;

step 304, backing up the PCAP file through a locally created virtual network card, scanning the PCAP file according to subscription rules and forensics of the resource file transmitted by the network, and forwarding the PCAP file to a PCAP and forensics file detection subsystem through a distributed message queue for virus detection;

finally, inquiring the fused detection result data set regularly, and processing the PCAP and the evidence obtaining file; and (3) unloading daily flow statistical indexes, normal flow sampling and detection results into a big data platform in batches, storing the data in a classified mode, and gradually cleaning event data with different security levels according to a defined life cycle.

2. The intelligent network perception method for progressive concurrent cache optimization performance as claimed in claim 1, wherein said mass on-chip memory resource SPM block stores 2M data including network card data and CPU processed data; a plurality of SPM blocks construct a cache partition, and the requirement of different types of network card packet processing rates is met; the data stored on each SPM block is directly accessed through the memory address, and continuous stream access among the blocks is carried out through the initial address and the load data length without being transmitted through a data bus.

3. The intelligent network sensing method for optimizing performance of progressive concurrent caching as claimed in claim 1, wherein in step 301, the metadata comprises a network tcp/ip packet quintuple, protocol fields, an uplink byte number and a downlink byte number;

the application layer protocol metadata comprises a URL (Uniform resource locator) of an http protocol, a request code, a response code, submitted data and responded data;

and (4) detecting a rule matching type, and alarming in real time in order to find abnormal behaviors with obvious network flow and protocol field characteristics.

4. The intelligent network-aware method for progressive concurrent cache performance optimization according to claim 1, wherein the step 302 specifically comprises: aiming at network traffic carried and triggered by malicious behaviors to be executed in an application layer, a plurality of analysis models of characteristic field scanning, state machine reasoning, machine learning model classification, abnormal semantics and abnormal logic flow are established, and the application behavior types of the network traffic are labeled from a plurality of dimensions.

5. The intelligent network-aware method for progressive concurrent cache optimization performance according to claim 1, wherein the iterative fusion specific process comprises:

firstly, regularly collecting a detection result output by a three-channel subsystem, training, detecting and generating a deep learning model by reducing dimensions, vectorizing, sparsely reconstructing and enhancing training data, and constructing a sustainable self-learning training environment of the model by using an enhanced learning and confrontation generation algorithm;

then, sorting and selecting each trained deep learning model according to the standards of high accuracy and low time consumption, reducing the weight of a long-term historical model according to an index equal ratio, and performing time sequence type index moving average weighted fusion learning to obtain a fused detection result;

specific fusion is performed according to three topics including:

1) integrating a multi-source deep learning model obtained by a network layer, an application layer, a user and a relation layer, and performing integrated learning through Bayesian inference;

2) establishing a relation set and a composite alarm event set through a knowledge graph;

3) constructing a typical behavior label, dividing a single event detection result and labeling a group event set through the typical behavior portrait of the user;

and finally, modeling a Bayesian network model based on the historical sample detection accuracy, calculating the joint detection probability of each classification of the new sample by inference, and continuously updating the deep learning model by using the result.

6. The intelligent network sensing method for optimizing performance of progressive concurrent caching as claimed in claim 1, wherein said regularly querying the fused detection result dataset specifically comprises:

firstly, defining the storage periods of events with different security levels, and dividing the storage requirements of data;

then, according to the result sample which needs to be stored for a long time, the related files are filtered, reduced and indexed to form a reduced file and a sample index.