CN120582813A

CN120582813A - Network attack traffic detection method, device, computer equipment and storage medium

Info

Publication number: CN120582813A
Application number: CN202510591922.2A
Authority: CN
Inventors: 顾钊铨; 石雨佳; 张欢; 王子雨; 曾丽仪; 刘长安; 景晓; 罗翠; 袁华平
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2025-05-08
Filing date: 2025-05-08
Publication date: 2025-09-02

Abstract

The embodiments of the present application provide a network attack traffic detection method, apparatus, computer equipment, and storage medium. The method includes: obtaining the traffic to be detected and the corresponding detection instructions; inputting the traffic to be detected and the detection instructions into a large language model to obtain the corresponding traffic detection results; wherein the large language model is trained by a first preset large language model to learn the traffic pattern based on a first loss to obtain a second preset large language model, and the second preset large language model is trained to understand the instructions based on a second loss; the first loss is determined based on the difference between the predicted traffic to be detected and the first sample traffic to be detected, and the predicted traffic to be detected is obtained by recursive prediction based on the first sample traffic to be detected; the second loss is determined based on the difference between the predicted traffic detection result output by the second sample traffic to be detected and the sample traffic detection result. In this way, the flexibility and accuracy of network attack traffic detection can be improved.

Description

Network attack flow detection method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method and apparatus for detecting network attack traffic, a computer device, and a storage medium.

Background

Network attacks are actions by which an attacker illegally invades, destroys, or steals computer systems, networks, and data using technical means or vulnerabilities, such as virus propagation, phishing fraud, DDoS attacks, etc. Network attacks may cause serious consequences such as information leakage, data tampering, service interruption, property loss and even social infrastructure breakdown, so in order to maintain the ecological stability of the network and the sustainable development of the urban digital society, the network attacks need to be detected, and the threats are timely identified, attack chains are blocked, and the privacy and the asset security of users are protected.

In the related art, a signature-based detection method is generally adopted to detect network attacks, the signature-based detection method matches known threat modes in network traffic or system behaviors through a predefined attack feature signature library, and when the monitored behaviors are completely matched with attack features in the signature library, the system triggers an alarm and takes defensive measures. However, the signature-based detection method relies on a signature library with known attack characteristics, and for novel attacks or unknown threats, especially when an attacker bypasses detection by adopting means such as code confusion, encryption or deformation, the signature library may not be identified, so that the problem of insufficient detection accuracy and flexibility exists when network attack detection is performed.

Disclosure of Invention

The embodiment of the application mainly aims to provide a network attack flow detection method, a network attack flow detection device, computer equipment and a storage medium, which can improve the flexibility and the accuracy of network attack flow detection.

In order to achieve the above object, a first aspect of an embodiment of the present application provides a method for detecting network attack traffic, where the method includes:

acquiring flow to be detected and a detection instruction generated according to the flow to be detected;

Inputting the flow to be detected and the detection instruction into a large language model to obtain a flow detection result corresponding to the flow to be detected;

the large language model carries out flow pattern learning training based on first loss through a first preset large language model to obtain a second preset large language model, and carries out instruction understanding training based on second loss through the second preset large language model to obtain the large language model;

The first loss is determined according to the difference between the predicted flow to be detected and the first sample flow to be detected through the first preset large language model, and the predicted flow to be detected is obtained by recursively predicting the first sample flow to be detected through the first preset large language model;

The recursion prediction process is a recursion process of generating an intermediate prediction sequence based on a first flow characteristic corresponding to a first time step of the flow to be detected of the first sample, and predicting a next flow characteristic based on the intermediate prediction sequence so as to update the intermediate prediction sequence according to the next flow characteristic;

And determining the second loss through the second preset large language model based on the difference between the predicted flow detection result output by the second sample flow to be detected and the sample flow detection result.

Accordingly, a second aspect of an embodiment of the present application proposes a network attack traffic detection device, where the device includes:

The acquisition module is used for acquiring the flow to be detected and a detection instruction generated according to the flow to be detected;

the input module is used for inputting the flow to be detected and the detection instruction into a large language model to obtain a flow detection result corresponding to the flow to be detected;

In some embodiments, the network attack traffic detection device further includes a training module configured to:

Acquiring a first sample to-be-detected flow from a preset pre-training data set, and carrying out recursion prediction on the first sample to-be-detected flow through the first preset large language model to obtain a predicted to-be-detected flow;

Determining a first loss based on a difference between the first sample to-be-detected flow and the predicted to-be-detected flow;

Training the first preset large language model based on the first loss to obtain a second preset large language model;

Acquiring a second sample flow to be detected, a corresponding sample flow detection result and a plurality of sample detection instructions corresponding to the second sample flow to be detected from a preset fine adjustment data set, and sequentially inputting the second sample flow to be detected and each sample detection instruction into the second preset large language model to obtain a predicted flow detection result;

determining a second loss based on a difference between the predicted flow detection result and the sample flow detection result;

and fine-tuning the second preset large language model based on the second loss to obtain a large language model.

In some embodiments, the training module is further to:

Generating an intermediate prediction sequence based on a first flow characteristic corresponding to a first time step of the flow to be detected of the first sample through the first preset large language model;

Predicting a next flow characteristic of a next time step based on the intermediate prediction sequence, updating the next flow characteristic to an end of the intermediate prediction sequence to update the intermediate prediction sequence;

Repeating the step of predicting the next flow characteristic of the next time step based on the intermediate prediction sequence, updating the next flow characteristic to the tail end of the intermediate prediction sequence to update the intermediate prediction sequence until the number of the flow characteristics of the recursively updated intermediate prediction sequence is the same as the number of the flow characteristics of the flow to be detected of the first sample, and obtaining the predicted flow to be detected based on the intermediate prediction sequence corresponding to the last time step.

In some embodiments, the training module is further to:

adding a low-rank adaptation layer in the self-attention layer in the second preset large language model;

And adjusting the parameters of the low-rank adaptation layer based on the second loss to obtain a large language model.

In some embodiments, the network attack traffic detection device further includes a construction module configured to:

Acquiring attack load data, normal flow data, attack flow data and vulnerability code data;

performing data cleaning processing on the attack load data, the normal flow data, the attack flow data and the vulnerability code data to obtain a plurality of initial sample data;

Respectively marking a plurality of characteristic fields contained in each initial sample data according to a plurality of preset classification fields to obtain a first sample flow to be detected corresponding to each initial sample data;

and constructing a pre-training data set according to the flow to be detected of the plurality of first samples.

In some embodiments, the network attack traffic detection device further includes a generation module configured to:

Acquiring a plurality of preset instruction types;

Generating a plurality of seed instructions corresponding to a plurality of instruction types according to the flow to be detected of each second sample;

Expanding task instructions corresponding to each seed instruction according to a plurality of preset network attack categories to obtain a plurality of sample detection instructions corresponding to the flow to be detected of each second sample under the plurality of network attack categories;

and acquiring a sample flow detection result corresponding to each second sample flow to be detected, and generating a fine adjustment data set based on the second sample flows to be detected, the corresponding sample detection instructions and the corresponding sample flow detection results.

In some embodiments, the network attack traffic detection device further includes a mapping module configured to:

Acquiring preset vulnerability classification, and performing feature extraction on the flow detection result to obtain structured data of corresponding attack features;

Acquiring a preset security knowledge base, mapping the structured data in the security knowledge base, and determining vulnerability entries and threat levels corresponding to the structured data;

and generating a corresponding processing scheme based on the vulnerability item and the threat level corresponding to the flow detection result.

Accordingly, a third aspect of the embodiments of the present application provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the processor implements the network attack traffic detection method according to any one of the embodiments of the first aspect of the present application when executing the computer program.

Accordingly, a fourth aspect of the embodiments of the present application proposes a computer readable storage medium, where a computer program is stored, which when executed by a processor implements a network attack traffic detection method according to any one of the embodiments of the first aspect of the present application.

The method comprises the steps of obtaining flow to be detected and a detection instruction generated according to the flow to be detected, inputting the flow to be detected and the detection instruction into a large language model to obtain a flow detection result corresponding to the flow to be detected, carrying out flow pattern learning training on the large language model based on first loss through a first preset large language model to obtain a second preset large language model, carrying out instruction understanding training on the second loss through the second preset large language model, determining the first loss through the first preset large language model according to the difference between the predicted flow to be detected and the first sample flow to be detected, carrying out recursion prediction on the predicted flow to be detected through the first preset large language model based on the first sample flow to be detected, generating an intermediate prediction sequence based on the first flow characteristic corresponding to the first time step of the first sample flow to be detected, predicting the next flow characteristic based on the intermediate prediction sequence, carrying out recursion the second loss through the second preset large language model based on the second loss, and determining the difference between the predicted flow detection result output by the second sample to be detected and the sample flow detection result. Therefore, the method and the device can learn and train a large number of flow modes without depending on a predefined attack characteristic signature library, realize self-adaptive learning of flow characteristics by using a first preset large language model, force the model to deeply understand an implicit attack behavior evolution rule in the flow, even if the attack flow is encrypted or deformed in a segmentation way, the model can still recognize an abnormal mode through a context dependency relationship, and improve the flexibility and accuracy of the model to network attack flow detection. In summary, the application can enable the model to not only identify the statistical abnormal characteristics of unknown attack, but also flexibly switch the detection logic according to the real-time defense strategy, thereby realizing the double breakthrough of accuracy and adaptability in the dynamic countermeasure scene, i.e. the application can improve the flexibility and accuracy of detecting the network attack flow.

Drawings

Fig. 1 is a schematic diagram of an architecture of a network attack traffic detection system according to an embodiment of the present application;

fig. 2 is a flowchart of a network attack traffic detection method according to an embodiment of the present application;

FIG. 3 is a flow chart of a training model provided by an embodiment of the present application;

Fig. 4 is a general flow chart of a network attack traffic detection method according to an embodiment of the present application;

fig. 5 is a schematic diagram of a functional module of a network attack traffic detection device according to an embodiment of the present application;

fig. 6 is a schematic hardware structure of a computer device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Based on the above, the embodiment of the application provides a network attack flow detection method, a network attack flow detection device, a computer device and a storage medium, which can improve the flexibility and the accuracy of network attack flow detection.

The method, the device, the computer equipment and the storage medium for detecting the network attack flow provided by the embodiment of the application are specifically described through the following embodiments, and the network attack flow detection system in the embodiment of the application is described first.

Referring to fig. 1, in some embodiments, a network attack traffic detection system is provided in the present application, which includes a terminal 11 and a server 12.

The terminal 11 may be used for collecting original network traffic data and performing preliminary data preprocessing, and may be a network traffic collection device such as a network switch, a router or a dedicated network traffic collector, a personal computer or a workstation for capturing local traffic, or an internet of things device such as an intelligent camera, an intelligent home device, etc.

Further, the server side may be a high performance server, a CPU acceleration server, a distributed computing cluster, a cloud service platform, and the like. The terminal 11 first detects and grabs network traffic in real time, then performs basic filtering, decodes transmission control protocol (Transmission Control Protocol, TCP) streams, extracts hypertext transfer protocol (Hypertext Transfer Protocol, HTTP) request/response key information, etc. preprocessing operations, converts binary format data into readable text format, and generates a structured output file. The data after preliminary processing is uploaded to the server side 12 by the terminal 11, and the server side 12 is responsible for receiving the data from different terminals 11 and integrating the data into a unified data management system, so that the subsequent data cleaning, formatting and constructing the pre-training data set and the fine tuning data set are facilitated.

Furthermore, the server 12 can pretrain and fine tune the large language model by using the constructed pretraining dataset and fine tuning dataset, enhance the time sequence modeling capability of the model by an autoregressive method, and realize efficient parameter adjustment by adopting LoRA technology, so that the model can better adapt to the specific task requirements in the network security field. Finally, the trained target model is deployed at the server 12, and is used for analyzing the real-time network traffic data uploaded from the terminal 11 in real time, identifying potential security threats, mapping the detection results to security knowledge bases such as open Web application security projects (Open Web Application Security Project, OWASP), general defect enumeration (Common Weakness Enumeration, CWE) and the like, generating detailed attack reports and response suggestions for the user or directly triggering defensive measures.

The method for detecting the network attack flow in the embodiment of the application can be illustrated by the following embodiment.

In the embodiments of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first. Moreover, the collection, use, processing, etc. of such data would comply with relevant laws and regulations. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through popup or jump to a confirmation page and the like, and after the independent permission or independent consent of the user is definitely acquired, the necessary relevant data of the user for enabling the embodiment of the application to normally operate is acquired.

In the embodiment of the present application, description will be made from the dimension of the network attack traffic detection device, which may be integrated in a computer apparatus in particular. Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a network attack traffic detection method according to an embodiment of the present application, where in the embodiment of the present application, a network attack traffic detection device is specifically integrated on a terminal or a server, for example, and when a processor on the terminal or the server executes a program instruction corresponding to the network attack traffic detection method, the specific flow is as follows:

Step 101, obtaining a flow to be detected and a detection instruction generated according to the flow to be detected.

In some embodiments, in order to lay a foundation for subsequent deep analysis and threat processing, the flow to be detected and the corresponding detection instruction can be acquired, so that the subsequent large language model can learn and infer based on real network environment information, and the accuracy and pertinence of detection are improved.

The traffic to be detected may be actual network traffic data packets captured from the network environment, where the data packets may include normal traffic content, and may also include potential attack or abnormal activities.

The detection instruction can be a specific task instruction automatically generated based on the flow to be detected or manually formulated by a technician and is used for guiding the large language model to execute a specific safety detection task.

The method can be used for obtaining the flow to be detected by directly capturing the data packet from the network interface by using a professional flow capturing tool (such as WIRESHARK, TCPDUMP), or can be used for copying the flow of the designated port to the detection equipment for analysis by configuring a mirror port on a switch or a router, or can be used for obtaining the flow to be detected by inserting and copying the flow in a network link by a test access Point (TEST ACCESS Point, TAP) equipment, and can be used for obtaining other obtaining modes of the flow to be detected in practical application.

Specifically, the traffic to be detected may be internal network traffic, boundary traffic, cloud environment traffic, terminal traffic, and the like.

Further, in order to avoid wasting resources, the flow may be preliminarily filtered. For example, only traffic of a particular protocol or a particular internet protocol (Internet Protocol, IP) address range may be captured. The filter rules may be specifically defined using Berkeley packet filter (Berkeley PACKET FILTER, BPF) syntax.

For example, the detection instruction to be detected may be determined according to the protocol type of the traffic to be detected. Detection instructions may also be automatically generated for subsequent threat analysis and response based on traffic characteristics, predefined rules, machine learning models, or threat intelligence.

The model may be required to classify and identify specific types of attack (e.g., SQL injection, cross-site scripting attack) or to perform feature extraction and risk assessment on unknown traffic patterns. Under the technical scene, the detection instruction is often combined with the professional knowledge in the field of network security, and is expanded into a richer instruction set by a Self-guiding method (Self-Instruct) and the like, so that the learning capacity and the application range of the model are enhanced.

By the method, solid data support and action guidelines can be provided for subsequent safety analysis, and the efficiency and accuracy of detecting the flow to be detected can be improved.

102, Inputting the flow to be detected and a detection instruction into a large language model to obtain a flow detection result corresponding to the flow to be detected;

The large language model carries out flow pattern learning training based on the first loss through a first preset large language model to obtain a second preset large language model, and carries out instruction understanding training based on the second loss through the second preset large language model to obtain the large language model;

The first loss is determined according to the difference between the predicted flow to be detected and the first sample flow to be detected through a first preset large language model, and the predicted flow to be detected is obtained through recursion prediction based on the first sample flow to be detected through the first preset large language model;

the recursion prediction process is a recursion process of generating an intermediate prediction sequence based on a first flow characteristic corresponding to a first time step of the flow to be detected of the first sample and predicting a next flow characteristic based on the intermediate prediction sequence so as to update the intermediate prediction sequence according to the next flow characteristic;

and determining the second loss through a second preset large language model based on the difference between the predicted flow detection result output by the flow to be detected of the second sample and the sample flow detection result.

In some embodiments, in order to obtain a detailed detection result for the flow, the flow to be detected and a corresponding detection instruction may be input into a specifically trained large language model, so as to improve the efficiency and accuracy of detection.

The large language model may be a trained large neural network model capable of understanding and generating natural language or structured data (e.g., network traffic). In this context, it is used to analyze and interpret network traffic and provide detailed traffic detection results according to given detection instructions.

The flow detection result may be a result output by the large language model after the input flow to be detected and the detection instruction, and includes judging information about whether the flow is a malicious attack and the type thereof.

The first preset large language model may be an initial version of the large language model, and is used for learning a flow mode from a large amount of historical network flow data, and performing preliminary training by optimizing the first loss.

The first loss can be used for measuring the difference between the predicted flow to be detected of the first preset large language model and the actual flow to be detected of the first sample, and is the basis for optimizing model parameters.

The second preset large language model can be a model obtained by training based on instruction understanding based on the first preset large language model, and aims to better understand and execute detection tasks related to flow detection.

The second loss may be a difference between the predicted flow detection result output by the second predetermined large language model and the actual sample flow detection result when the second sample flow to be detected is processed, so as to guide fine adjustment of the model. Because the second preset large language model has already learned the flow pattern, the second preset large language model needs to be trained for its ability to understand the instructions related to flow detection, and if the second preset large language model can accurately understand the instructions, the second loss is smaller, and otherwise, larger.

The first sample traffic to be detected may be sample data for training a large language model to learn traffic patterns, which may be known normal and abnormal network traffic instances. The sample type of the traffic to be detected for each first sample may be attack payload, HTTP request fragment, vulnerability code, etc.

The second sample traffic to be detected may be sample data for further training the model to understand specific detection instructions, which may be known normal and abnormal network traffic instances.

The first time step may be a first time point in the time sequence representation of the network traffic, where the first time step corresponds to a first characteristic value of the traffic to be detected by the first sample.

The first traffic characteristic may be a specific traffic characteristic over a first time step, such as a first HTTP request header field in a TCP stream.

The intermediate prediction sequence may be a series of prediction values gradually constructed in the recursive prediction process, and is continuously updated and perfected as more time steps of information are added.

The predicted traffic detection result may be a result predicted by the large language model based on the input traffic and the detection instruction regarding the traffic property (e.g., whether or not there is an attack).

The sample flow detection result can be a known standard answer or a flow detection result under the real condition, and is used for comparing and evaluating the accuracy of model prediction.

Illustratively, if it is desired to detect whether there is a potentially malicious behavior (e.g., SQL injection attack) on enterprise A's network traffic. If the acquired flow to be detected is:

timestamp 2025-04-14T10:00:00;

Source IP 192.168.1.10;

target IP 203.0.113.5;

Protocol HTTP;

Request content GET/loginusername =admin ' OR '1' = '1' — HTTP/1.1;

For example, if the pair of detection instructions is generated according to the flow to be detected:

checking whether the HTTP request contains SQL injection keywords (such as ' OR '1' = '1', ' UNIONSELECT ', etc.), if the SQL injection feature is found, marking the SQL injection feature as high-risk traffic, and generating an alarm log.

Inputting the flow to be detected and the detection instruction into a trained large language model, wherein the large language model firstly analyzes the content of the flow, and extracts key fields (such as a time stamp, a source IP, a target IP, a protocol type, request content and the like). Then, the large language model understands the requirement of the detection instruction, identifies key features (such as SQL injection keywords) to be detected, and obtains a corresponding flow detection result. For example, the flow detection result may be:

Flow detection result:

detecting the state of high risk;

the detection reason is that the SQL injection keyword ' OR '1' = '1' ";

The response advice is marked as high-risk traffic, and an alarm log is generated and notified to a security team;

The detection instruction can also be generated by automatically detecting key fields in the flow to be detected through a model, for example, suspicious SQL injection keywords "OR '1' = '1'" are contained in the request content, other fields (such as source IP and target IP) have no obvious exception, the model can infer that the suspicious SQL injection keywords are possibly an SQL injection attack attempt according to the key fields, based on the analysis, the model can automatically generate a detection instruction aiming at the threat of the type, and the generated detection instruction is applied to the detection of the current flow OR other similar flow to be detected, so that the detection efficiency and accuracy can be improved, and the dependence on manual intervention is reduced. Or the detection instruction can be written manually through the experience of a security expert, and the form and the setting mode of the detection instruction can be set according to actual conditions, so that the application is not particularly limited.

For example, training of a model may be divided into two phases, traffic pattern learning training based on a first penalty and instruction understanding training based on a second penalty. The training process in these two phases will be described below.

Illustratively, because network traffic has complex patterns and structures, which may include time series characteristics, protocol types, packet sizes, etc., the present application forces a first pre-set large language model to learn these patterns by employing a recursive predictive method in a first stage of model training so that the model better understands and predicts future traffic, thereby identifying anomalies.

Specifically, the flow to be detected of the first sample may be input into a first preset large language model, the first preset large language model is used to analyze the flow characteristic of the first time step (such as the first character or field of the flow to be detected of the first sample), the flow characteristic of the next time step is predicted according to the flow characteristic of the first time step, the intermediate prediction sequence is updated according to the flow characteristic of the first time step and the flow characteristic of the next time step, the flow characteristic of the next time step is continuously predicted according to the updated intermediate prediction sequence, and the intermediate prediction sequence is continuously updated until the prediction of the whole flow sequence is completed, that is, until the feature quantity of the updated intermediate prediction sequence is consistent with the feature quantity of the flow to be detected of the first sample, the prediction is stopped, and the predicted flow to be detected is obtained.

Further, if the first preset large language model can correctly learn the flow mode of the network flow, the predicted flow to be detected obtained by prediction is consistent with the first sample flow to be detected, namely, the first loss constructed based on the difference between the predicted flow to be detected and the first sample flow to be detected is small, otherwise, the first loss is large. After predicting new predicted flow to be detected each time, calculating a first loss cross entropy, and adjusting parameters of a first preset large language model according to the first loss to obtain a second preset large language model.

The first loss may be a mean square error loss, a cross entropy loss, a sequence-to-sequence loss, or the like, and the corresponding loss function may be specifically selected according to the actual situation to calculate the first loss.

For example, on the basis that the model has good flow pattern recognition capability, the model can be further trained in the second stage to understand specific detection instructions, and accurate detection results can be generated according to the instructions.

Specifically, in the second stage of training the second preset large language model, that is, the fine tuning stage, each second sample to-be-detected flow and a corresponding sample detection instruction may be input into the second preset large language model, and the second preset large language model may process the corresponding second sample to-be-detected flow according to the given sample detection instruction, so as to generate a corresponding predicted flow detection result. For example, if the predicted flow rate detection result of the flow rate a to be detected for the second sample is:

detecting the state of high risk;

The detection reason is that the SQL injection keyword ' OR '1' = '1' "isdiscovered.

The sample flow rate detection result of the second sample flow rate A to be detected is as follows:

detecting the state of high risk;

The predicted flow rate detection result completely matches the sample flow rate detection result, the second loss is smaller or 0, and otherwise the second loss is larger. And fine tuning is performed on the second preset large language model based on the second loss, so that the capability of the model for generating targeted response according to specific instructions can be enhanced, and the response effectiveness is improved.

The second loss may be a classification loss, a regression loss, a cross entropy loss, or the like, and the second loss may be calculated by selecting a corresponding loss function according to actual situations.

Referring to fig. 3, in some embodiments, in order to enable a large language model to accurately identify network attack traffic, the model may be pre-trained according to a preset pre-training data set to enable the model to primarily grasp traffic patterns, and then model parameters are further optimized through a preset fine tuning data set to enable the model to accurately understand and execute specific detection instructions. Illustratively, a large language model may be trained by:

step 201, obtaining a first sample to-be-detected flow from a preset pre-training data set, and performing recursion prediction based on the first sample to-be-detected flow through a first preset large language model to obtain a predicted to-be-detected flow;

Step 202, determining a first loss based on a difference between a first sample flow to be detected and a predicted flow to be detected;

step 203, training the first preset large language model based on the first loss to obtain a second preset large language model;

Step 204, obtaining a second sample to-be-detected flow, a corresponding sample flow detection result and a plurality of sample detection instructions corresponding to the second sample to-be-detected flow from a preset fine adjustment data set, and sequentially inputting the second sample to-be-detected flow and each sample detection instruction into a second preset large language model to obtain a predicted flow detection result;

Step 205, determining a second loss based on a difference between the predicted flow rate detection result and the sample flow rate detection result;

And 206, fine-tuning the second preset large language model based on the second loss to obtain the large language model.

The pre-training data set may be an unstructured/weakly structured corpus composed of original data (such as network traffic, attack load, vulnerability code, etc.) in the attack detection field, and is used for performing secondary pre-training on a first preset large language model (such as LLaMA) so as to enable the first preset large language model to learn the grammar structure, the attack behavior characteristics and the vulnerability code logic of the network traffic.

The fine-tuning data set can be a task-oriented data set formed by a pair of response pairs, and is used for aligning specific capabilities (such as classification and entity identification) of the large language model in an attack detection scene.

The sample detection instruction can be a specific task instruction provided for the large language model in the fine tuning stage and used for guiding the model to process and analyze the input flow to be detected. For example, a sample detection instruction may be one that requires the model to identify a particular type of attack (e.g., SQL injection), or to extract certain critical information from the traffic (e.g., IP address), or to interpret the traffic's behavior, etc. Sample detection instructions are used to make the model better understand and adapt to the complex demands of practical applications.

In some embodiments, each first sample to-be-detected flow in the pre-training data set may be a section of text or a structured data segment, including an attack feature or a normal flow feature, and when the first sample to-be-detected flow is obtained FROM the pre-training data set and the second pre-training is performed on the first pre-set large language model, the model needs to recursively predict the flow feature of a subsequent time step, so as to force the model to understand causal logic in the attack flow to grasp a network flow grammar structure (such as HTTP protocol format) and an attack behavior pattern (such as SQL injection load feature), for example, an injection pattern of SELECT FROM users followed by user= 'admin'.

Specifically, in the training process of the first preset large language model, the flow to be detected of each first sample can be divided into a plurality of flow characteristics according to a plurality of time steps, prediction is performed through the first preset large language model based on the first flow characteristics corresponding to the first time step, the flow characteristics of the next time step adjacent to the first time step are obtained, the flow characteristics of the next time step are updated to an intermediate prediction sequence where the flow characteristics corresponding to the first time step are located, and an updated intermediate prediction sequence is obtained. And then, predicting the flow characteristics of the next time step according to the updated intermediate prediction sequence, and so on, and finally obtaining the predicted flow to be detected.

Further, the first loss may be calculated based on a difference between the first sample to-be-detected flow and the predicted to-be-detected flow. Illustratively, the first loss may be a cross entropy loss L _pre-train, with the formula:

Wherein T is the total number of time steps dividing the flow to be detected of the first sample, θ is a model parameter, x _＜t is a flow characteristic corresponding to the previous time step, and x _t is a flow characteristic predicted according to x _＜t.

Further, after the first loss is calculated, model parameters of the first preset large language model can be adjusted through back propagation, so that the model parameters are more prone to generating a sequence consistent with the real flow.

Specifically, each second sample to-be-detected flow in the fine tuning data set is associated with at least one corresponding sample detection instruction and sample flow detection result. For example, each second sample to-be-detected traffic may be GET/searchq = < script > alert (1) </script > HTTP/1.1, and its associated sample detection instruction may be to detect whether the following request is an XSS attack, if so, the attack load is extracted, and the sample traffic detection result may be "attack type: xss\n attack load: < script > alert (1) </script >". And if the second preset large language model outputs the attack type of the flow to be detected according to the second sample, namely XSS\n attack load is < script > alert (1) </script >, the second loss is lower, otherwise, if the second preset large language model outputs the attack type, namely no attack, the loss is higher.

Furthermore, the second loss can be calculated according to the cross entropy loss, model parameters are updated through gradient descent based on the second loss, the relevance between the learning attack characteristics of the second preset large language model and the instruction is enhanced, and the generation capacity of the pre-training model is aligned with a specific safety analysis task, so that an accurate predicted flow detection result can be generated according to the instruction.

By performing secondary pre-training and fine tuning on the large language model, the large language model can gradually learn the general mode of network traffic and the characteristics of specific attack detection tasks. The fine tuning stage further optimizes the performance of the model on an attack detection task through a task-oriented instruction and marking data, thereby improving the accuracy and the flexibility of the model for detecting the network attack flow and enabling the model to more effectively identify novel and complex attacks.

In some embodiments, in order to enhance the understanding and modeling capability of the first preset large language model on the traffic pattern, a predicted traffic sequence may be generated by a recursive prediction manner, so that the first preset large language model can capture dynamic changes and time dependencies in traffic data, and provide a more accurate traffic pattern learning basis for subsequent attack detection tasks. Illustratively, "recursively predicting the flow to be detected based on the first sample by the first preset large language model to obtain the predicted flow to be detected" in step 201 may include:

(201.1) generating an intermediate prediction sequence based on a first flow characteristic corresponding to a first time step of the flow to be detected of the first sample through a first preset large language model;

(201.2) predicting a next flow characteristic for a next time step based on the intermediate prediction sequence, updating the next flow characteristic to the end of the intermediate prediction sequence to update the intermediate prediction sequence;

And (201.3) repeatedly executing the steps of predicting the next flow characteristic of the next time step based on the intermediate prediction sequence, updating the next flow characteristic to the tail end of the intermediate prediction sequence to update the intermediate prediction sequence until the number of the flow characteristics of the recursively updated intermediate prediction sequence is the same as the number of the flow characteristics of the flow to be detected of the first sample, and obtaining the predicted flow to be detected based on the intermediate prediction sequence corresponding to the last time step.

Wherein the next flow characteristic may be a flow characteristic value of the next time step predicted based on the current intermediate prediction sequence in the recursive prediction process. The next traffic characteristic may be a specific traffic attribute at the next point in time, such as a next HTTP request header field or a portion of the response body content.

The last time step may be the time step reached at the end of the entire recursive predictive procedure, where the number of flow features in the intermediate predictive sequence matches the actual number of features of the flow to be detected for the first sample. In other words, when the recursive prediction process is completed and the intermediate prediction sequence includes the same number of features as the original sample flow, the corresponding time step is the last time step, and at this time, based on the complete and continuous intermediate prediction sequence, the final predicted flow to be detected can be generated.

For example, the first preset large language model may predict the subsequent flow characteristics gradually from the first flow characteristic of the first time step based on the autoregressive generating mechanism, and the prediction of each step depends on the updated intermediate prediction sequence, so as to finally generate the complete predicted flow to be detected, thereby enabling the model to learn a more complex flow mode and improving the generalization capability thereof. This process is exemplified below.

Illustratively, first, a first traffic characteristic (e.g., POST) of a first time step is input, and then, a characteristic (e.g., log) of a next time step is predicted based on the intermediate prediction sequence "POST", and then, the characteristic (e.g., log) is added to the end of the intermediate prediction sequence, so as to obtain an updated intermediate prediction sequence "POST,/log". Thereafter, the characteristics of the next time step are predicted based on "POST,/log" (e.g. user=), adding it to the end of the intermediate predicted sequence, obtaining an updated intermediate predicted sequence POST,/log,? and so on, and the flow characteristic quantity of the intermediate prediction sequence is the same as that of the flow to be detected of the first sample.

In some embodiments, in order to avoid error accumulation, the intermediate prediction sequence may also be obtained by predicting, through a sliding window, a first sample flow characteristic of a first time step from a first sample flow to be detected, generating the intermediate prediction sequence according to the first sample flow characteristic, predicting, through a first preset large language model, a next flow characteristic corresponding to the next time step according to the intermediate prediction sequence, intercepting, through the sliding window, the next sample flow characteristic of the next time step from the first sample flow to be detected, updating, through the sliding window, the next sample flow characteristic according to the next sample flow characteristic to the end of the intermediate prediction sequence, predicting, through the first preset large language model, the next sample flow characteristic corresponding to the next time step, repeatedly executing, intercepting, through the sliding window, the next sample flow characteristic of the next time step from the first sample flow to be detected, updating, to the end of the intermediate prediction sequence according to the next sample flow characteristic, predicting, through the first preset large language model, obtaining, according to the intermediate prediction sequence, the next flow characteristic corresponding to the next time step, predicting, updating the intermediate prediction sequence, predicting, through the next sample flow characteristic corresponding to the next sample flow corresponding to the first sample flow to be detected, according to the first sample flow to be detected, and the sum of the characteristics of the first sample flow characteristics, and the first sample flow to be detected flow characteristics. Therefore, the prediction sequence can be gradually constructed, the training efficiency and the model flexibility are improved, and the error accumulation is reduced. Illustratively, the following steps for generating the predicted flow rate to be detected in the above manner are exemplified:

illustratively, the first sample to-be-detected traffic is:

GET/index.htmlHTTP/1.1\r\nHost:example.com\r\nUser-Agent:Mozilla/5.0(WindowsNT10.0;Win64;x64)\r\n\r\n;

Then, by sliding the window, the first sample traffic characteristic of the first time step (e.g., time step 1) may be truncated to "GET" to generate the intermediate prediction sequence "GET". The next flow characteristic' corresponding to the next time step (for example, time step 2) can be obtained by predicting according to the intermediate prediction sequence through the first preset large language model: and (3) intercepting the sample flow characteristics of the next time step (for example, time step 2) according to the flow to be detected of the first sample to obtain "/index.html", updating the "/index.html" to the tail end of the "GET", and obtaining an updated intermediate prediction sequence of the "GET/index.html". Thereafter, the flow characteristics, e.g. "/5.0", for the next time step (e.g. time step 3) are predicted from the intermediate prediction sequence. And intercepting the sample flow characteristics of the next time step (for example, time step 3) according to the first sample flow to be detected through a sliding window to obtain HTTP/1.1, updating to the tail end of GET/index.html according to HTTP/1.1 to obtain an updated intermediate prediction sequence of GET/index.html HTTP/1.1, and so on until the sum of the number of the first sample flow characteristic and the number of the plurality of flow characteristics is the same as the number of the flow characteristics of the first sample flow to be detected, and generating the predicted flow to be detected based on the first sample flow characteristic and the plurality of flow characteristics. It should be noted that, in this example, the sample flow characteristic may be a characteristic of a corresponding time step taken from the flow to be detected of the first sample, and the flow characteristic is a characteristic of a predicted corresponding time step.

The flow prediction sequence is gradually constructed in a recursion prediction mode, so that the understanding capability of the model to the flow mode can be enhanced, a more accurate flow mode learning basis is provided for subsequent attack detection tasks, and the detection performance and adaptability of the model in a complex network environment are improved.

In some embodiments, in order to achieve customized optimization for a specific task while maintaining the overall structure of the model stable, a low-rank adaptation layer (LoRA) may be introduced to perform efficient fine tuning on the basis of a second preset large language model (such as LLaMA 2), so that consumption of computing resources and training time can be reduced, and performance degradation possibly caused by large-scale adjustment of the whole model can be effectively avoided. For example, step 206 may include:

(206.1) adding a low rank adaptation layer in the self-attention layer in the second pre-set large language model;

(206.2) adjusting parameters of the low-rank adaptation layer based on the second loss to obtain a large language model.

The self-attention layer can be one of key components in a large language model, and allows the large language model to pay attention to the relation between different positions inside a sequence when processing sequence data (such as text or network traffic), so as to dynamically adjust the importance degree of each part in an input sequence, and further more accurately capture long-distance dependency and complex mode characteristics.

The Low-rank adaptation layer (Low-Rank Adaptation Layer, loRA) may be a trainable parameter layer introduced by Low-rank matrix decomposition based on the second preset large language model weight.

In some implementations, a low rank adaptation layer may be added to some or all of the positions in the four weight matrices of query (Q), key (K), value (V), output (O), with task specific knowledge injected by low rank decomposition. For example, the method can be added to a query matrix (Q) and a value matrix (V), wherein the query matrix (Q) can be used for learning attention focusing modes (such as sensitive fields in attention attack loads) related to tasks, and the value matrix (V) can be used for adjusting feature representations after attention weighting (such as semantic coding of enhanced malicious traffic features).

In some implementations, a low rank adaptation layer may be applied to the weight matrix of the feedforward network of the model, the embedding layer of the model, and so on, in addition to the self-attention layer.

Illustratively, the parameters of the low rank matrices a and B may be updated by a back propagation algorithm according to the value of the first loss function. And after multiple iterations, the parameters of the rank adaptation layer are gradually adjusted to an optimal state.

By adopting LoRA method to fine-tune the second preset large language model, a small amount of update can be carried out on the large-scale pre-training model, thereby realizing the high-efficiency and customized tuning of specific tasks. In the fine tuning process, the second preset large language model is trained to better understand and identify complex security threats by combining the fine tuning data set and specific tasks in the network security field, such as attack type classification, entity identification, abnormal flow detection and the like, so that the language understanding capability of the second preset large language model is fully exerted, and meanwhile, the efficient application of the second preset large language model in the network security field is ensured.

In some embodiments, in order to provide a basis for subsequent large language model training, multiple types of network security data can be collected and processed, cleaned and labeled in a classification manner, and finally a high-quality pre-training data set is constructed so as to ensure that the model can learn rich network traffic patterns and security threat features, thereby improving the detection capability of novel or unknown attacks. For example, before step 201, that is, "obtaining the first sample to be detected flow from the preset pre-training dataset and performing recursive prediction based on the first sample to be detected flow through the first preset large language model to obtain the predicted to be detected flow", the method may further include:

(A.1) acquiring attack load data, normal flow data, attack flow data and vulnerability code data;

(A.2) performing data cleaning processing on attack load data, normal flow data, attack flow data and vulnerability code data to obtain a plurality of initial sample data;

(a.4) constructing a pre-training data set according to the flow to be detected of the first samples.

The attack load data may be threat content actually transmitted in the network attack, including an original attack character string, an encoded/deformed load, such as a Base64 encoded SQL injection statement, a binary load uploaded by a malicious file, and the like.

The normal traffic data may be normal network traffic that does not include an attack behavior, and is used for establishing a baseline model to distinguish normal behavior from abnormal behavior, for example, the normal traffic data may be a normal HTTP request, HTTPs encrypted communication traffic, JSON response normally invoked by an application programming interface (Application Programming Interface, API), and the like.

The attack traffic data may be a network traffic sample recording attack behaviors and includes a complete interaction process of attack request and response, for example, the attack traffic data may be XML entity injection traffic of XXE attack, cross-domain request traffic of Cross-site request forging (CSRF) attack, flood data packets of distributed denial of service attack (Distributed Denial of Service, DDoS attack), and the like.

Wherein the vulnerability code data may be example code or description of known vulnerabilities for associating attack traffic with potential vulnerabilities.

The initial sample data may be the structured data after cleaning (denoising, de-duplication, error correction) and normalization (JSON formatting), and contains core information of the original data.

The classification field may be a predefined field for marking sample categories and attributes, such as data type, attack type, vulnerability association, risk level, time stamp, etc.

The feature field may be detailed information describing specific characteristics of each initial sample data, such as network layer features (e.g., source IP, destination IP, port number, protocol type), application layer features (e.g., HTTP request header, request body, response status code), attack features (e.g., malicious pattern in payload, encoding mode), context features (e.g., timestamp, session ID, traffic size), etc.

By way of example, an SQL injection, XSS attack original payload file (e.g., payload. Txt) may be downloaded from a public vulnerability library (e.g., exploit-DB). Alternatively, tools may be used to generate the attack payload of the coding variant, such as Base64 coding or URL coding of SQL statements.

For example, normal traffic data may be obtained from an enterprise intranet (e.g., a user browsing behavior log of an e-commerce website), or from a public dataset.

For example, attack traffic data may be obtained from a Packet Capture (PCAP) file (e.g., a buffer overflow attack) of an attack session generated by Metasploit. Or from a complete attack chain captured in the penetration test (e.g., from a port scan to an exploit).

For example, vulnerability code data may be obtained from security bulletins or from exploit code associated with a common vulnerability and exposure database (Common Vulnerabilities and Exposures, CVE) database.

In some embodiments, when performing data cleaning processing on attack load data, normal flow data, attack flow data and vulnerability code data to obtain a plurality of initial sample data, the regular expression may be used first to propose abnormal flow (such as binary confusion data) containing non-ASCII characters, and through HTTP status code verification, a server error record with a status code of 500 and a response body of empty is deleted. Thereafter, the HTTP request header is reordered alphabetically (e.g., unified Host field locations), the repeated payload is deleted (by computing a hash value for the attack payload, deleting repeated samples with the same hash), and so forth. Finally, the PCAP file is parsed into JSON format (key metadata is preserved) for structural processing, and so forth.

Illustratively, the classification field may be a protocol type, an attack stage, an attack carrier, and so on. For example, SQL injection samples can be marked as { "protocol": "HTTP/1.1", "attack stage": "exploit", "carrier type": "text", "CWE-ID": "CWE-89" }, normal login traffic is marked as { "protocol": "HTTPS", "attack stage": "none", "carrier type": "normal" }, so that a plurality of first sample traffic to be detected can be obtained to describe and distinguish different types of network traffic more accurately, thereby providing richer semantic information for model training and improving the recognition capability and detection accuracy of the model on attack traffic.

In some embodiments, the feature field may be set according to the actual situation, and may also include a timestamp, for example.

In some embodiments, the pre-training data set may be generated from a plurality of first sample traffic to be detected. Or the flow to be detected of the first samples can be divided into a pre-training data set and a verification set according to a preset proportion, for example, the training set and the verification set can be divided according to a proportion of 7:3.

By the method, the accuracy and consistency of data are improved, and abundant semantic information is provided for model training, so that the recognition capability and detection precision of the model to network attacks are enhanced, and the intelligent level of network security protection and the capability of coping with complex attack scenes are effectively improved.

In some embodiments, in order to improve the performance of the large language model in a specific network security task, a comprehensive task instruction set may be constructed by systematically generating and expanding seed instructions, and a fine-tuning data set may be generated based on the task instructions and corresponding sample flow detection results, so as to ensure that the second preset large language model may perform fine training for multiple network attack categories, thereby improving understanding and recognition capabilities of complex attack behaviors. For example, before step 204, "obtain the second sample to-be-detected flow from the preset fine adjustment data set, the corresponding sample flow detection result, and the plurality of sample detection instructions corresponding to the second sample to-be-detected flow, and sequentially input the second sample to-be-detected flow and each sample detection instruction into the second preset large language model to obtain the predicted flow detection result", before the step, further include:

(B.1) acquiring a plurality of preset instruction types;

(B.2) generating a plurality of seed instructions corresponding to a plurality of instruction types aiming at the flow to be detected of each second sample;

and (B.4) obtaining a sample flow detection result corresponding to each second sample flow to be detected, and generating a fine adjustment data set based on the second sample flows to be detected, the corresponding sample detection instructions and the corresponding sample flow detection results.

The instruction type can be attack type classification, attack load characteristic identification, attack entity information identification and the like.

The network attack class may be different types of network attacks, such as injection class attacks, cross-site scripting attacks, exploit class attacks, protocol layer attacks, and so on. Each network attack class represents a particular class of attack patterns or methods for classifying and tagging related network traffic data.

The seed instruction may be a basic instruction template generated under an attack detection scene according to an instruction type, and the basic instruction template may include description of a specific attack mode, identification requirements on attack traffic characteristics and classification requirements on attack behaviors. For example, the seed instruction may contain a basic sentence pattern such as "analyze the following HTTP request to determine the attack type".

The task instruction may be a diversified instruction formed by expanding the seed instruction by a self-guiding method. For example, for the same flow, a plurality of task instructions can be generated, and problems are respectively raised from the angles of attack type classification, load feature recognition, entity information recognition and the like, so that the model can be comprehensively and deeply learned and understood.

For example, a plurality of seed instructions corresponding to the flow to be detected of each second sample may be generated according to the instruction type. Taking SQL injection attack as an example, the seed instruction may be an attack type classification instruction, such as "identify whether the traffic is SQL injection attack", an attack load feature identification instruction, such as "extract SQL injection feature in the traffic, such as" or1=1' ", an attack entity information identification instruction, such as" identify attacker IP address and target URL in the traffic ", and so on.

In some embodiments, common attack detection scenarios may be first combed, network attack categories of attack detection tasks are defined based on the attack detection scenarios, and task intervals are set for each task, for example, the tasks are divided into subtasks such as attack type classification, attack load feature identification, attack technique identification, abnormal traffic detection, attack behavior interpretation, and the like. Each cyber attack category may be further refined, e.g., SQL injection in an injection attack may be further divided into boolean blind injection, temporal blind injection, joint query injection, error reporting injection, etc.

In some embodiments, the task instruction corresponding to each network attack category may be extended based on the seed instruction, for example, for the second sample to-be-detected traffic a, where the corresponding seed instruction may be to identify whether the second sample to-be-detected traffic a is attack traffic. After the dimensions of attack type classification, attack load characteristic identification, attack manipulation identification, abnormal flow detection, attack behavior interpretation and the like are expanded, a plurality of sample detection instructions corresponding to the flow a to be detected of the second sample can be obtained. Further, the seed instructions and corresponding sample detection instructions may be generated by a large language model, such as GPT-3.5, GPT-4, and the like.

In some implementations, the attack type classification instruction of the seed instruction may be "determine whether the HTTP request belongs to SQL injection, GET/searchq =1 'AND (SELECT FROMusers) -HTTP/1.1", the attack payload feature identification instruction may be "extract exception parameters in the payload: username=admin' UNIONSELECT, @ version, 3-", the attack entity information identification instruction may be "identify attacker IP and vulnerability parameters in the log: 2023-05-0114:22[ XSS ] source IP:10.0.0.5 request path:/commentcontent = < script >. A. The above is merely an example, and a specific seed instruction may be set according to actual situations, which is not excessively limited by the present application.

Further, a regular expression can be used for automatically extracting key fields of the flow to be detected of each second sample, and the sample detection instructions of the flow to be detected of the second sample under various seed instructions are assisted to be expanded.

In some embodiments, the seed instruction may be further subjected to an instruction complexity upgrade process, a cross-task combination process, and a defense association process, so as to expand the corresponding task instruction based on the seed instruction, and obtain a plurality of sample detection instructions. The instruction complexity upgrade process may upgrade a single classified task into a multi-step inference task, for example, a seed instruction is "determine whether to perform SQL injection", and the extension may obtain at least one task instruction, obtain "analyze whether the following request parameters have time blindness features and illustrate the basis of determination", where the cross-task combination process may merge classification and feature categories, for example, the extended task instruction may be "identify load type (reflection type/storage type) of the XSS attack and extract dangerous functions in the malicious script", where the defense association process may generate repair suggestions based on the attack features, for example, the extended task instruction may be "generate three defense schemes for detected XXE attacks, and require to include XML parser configuration modification".

For example, the original seed instruction may be "determine whether the following request is an SQL injection attack", and the corresponding second sample to-be-detected traffic may be "GET/productid =1' or1=1-", and then after generating a plurality of sample detection instructions corresponding to the second sample to-be-detected traffic based on the seed instruction, the following sample detection instructions may be obtained:

Analyzing the logic structure in the request parameters, judging whether the SQL injection attack belongs to joint query injection, boolean blind injection or forever condition injection (corresponding sample flow detection result is ' forever condition injection, forever logic bypass authentication is constructed through 1' OR1=1 ');

SQL injection payload features in the following requests, including closures, logical operators, and notes, are identified and their roles are accounted for. (the corresponding sample flow detection result is ' 1 ' a closer: shan Yinhao (') for closing the original query parameters; 2. A logical operator: OR1 = 1 constructs a forever condition; 3. An annotator: for intercepting subsequent query statements ");

assuming that the request is directed to the user login interface, the attacker intent is presumed, possibly revealing the data type. (the corresponding sample traffic detection result is "attack intention: bypass identity verification; possible revealing data: user table user/password");

The above is merely an example, and in actual situations, the extended seed instruction may be extended through multiple instruction dimensions. For example, the multiple instruction dimensions may include an attack subclass classification instruction, an attack feature identification instruction, an attack context association instruction, a defense suggestion generation instruction, an attack impact assessment instruction, a compound task instruction, an attack deformation detection instruction, and the like, and the seed instruction is extended from the multiple instruction dimensions, so that the full attack detection flow can be covered, which is helpful for enhancing understanding and identification capability of the model to different attack modes in the training process of the model.

For example, each second sample to-be-detected traffic may be associated with a plurality of sample detection instructions (e.g., 10 to 20, etc.), each sample detection instruction corresponding to a sample traffic detection result. And generating a data triplet based on the flow to be detected of each second sample, the corresponding associated sample detection instruction and the corresponding associated sample flow detection result, and generating a fine tuning data set according to the plurality of data triples.

By the method, the fine-tuning data set is generated, rich attack scene coverage is provided for model training, so that a large language model can accurately learn complex characteristics and potential relations of network attack, and meanwhile, the subsequent large language model can accurately understand instructions and give correct detection results.

In some embodiments, in order to improve the accuracy of identifying precision and risk assessment of network attack, feature extraction and structuring processing can be performed on the flow detection result, and the flow detection result is mapped into a preset security knowledge base, so that corresponding vulnerability entries and threat levels thereof are determined, and a targeted processing scheme is finally generated. Therefore, the overall network security protection capability can be effectively enhanced. For example, after step 102, that is, "inputting the flow to be detected and the detection instruction into the large language model to obtain the flow detection result corresponding to the flow to be detected", the method may further include:

(C.1) obtaining preset vulnerability classification and carrying out feature extraction on a flow detection result to obtain structured data of corresponding attack features;

(C.2) acquiring a preset security knowledge base, mapping the structured data in the security knowledge base, and determining vulnerability entries and threat levels corresponding to the structured data;

And (C.3) generating a corresponding processing scheme based on the vulnerability item and the threat level corresponding to the flow detection result.

The vulnerability classification may be a systematic classification of known security vulnerabilities, such as OWASP Top or CWE (Common Weakness Enumeration), each of which represents a specific type of security problem or vulnerability, facilitating identification and management.

The attack characteristic can be a specific identifier or a specific mode extracted from the flow detection result and is used for describing key characteristics of attack behaviors. For example, in an SQL injection attack, a particular SQL syntax or abnormal packet size, etc. may be included.

The structured data may be a data format that converts the original flow detection result into a standardized, easy to parse and process. Information such as field names, types, and values is typically included so that data can be efficiently processed by a computer program.

The security knowledge base may be a database or knowledge system, such as OWASP, CWE, etc., containing various known security vulnerabilities and detailed descriptions, ratings, solutions, etc., for comparison and mapping of models.

The vulnerability entry may be a specific record item in the security knowledge base, and describes detailed information of a specific vulnerability, including its name, description, scope of influence, repair suggestion, and the like.

The threat level can be a risk assessment level given by factors such as potential hazard degree and utilization possibility of the vulnerability, and is generally classified into three levels of high, medium and low, so that a security team can be assisted to process the most urgent problem preferentially.

The processing scheme may be a specific coping strategy or operation guide generated according to the vulnerability entry and the threat level thereof, and aims to instruct a security team to take appropriate defensive measures to mitigate or eliminate the threat. For example, for SQL injection attacks, the processing scheme may include suggestions to implement input validation and parameterized queries, etc.

For example, after detecting a flow detection result corresponding to a certain HTTP POST request flow, the structured data of the corresponding attack feature may be extracted from the flow detection result or the flow to be detected according to a preset vulnerability classification. For example, the vulnerability classification may be an attack type tag, a confidence value, an attack vector, an attack payload feature, an abnormal operator combination, a timestamp, and so forth. For example, the attack type tag may be "SQL_injection", the confidence value is 0.98, the attack vector is the POST request of the "/user/login" interface, the attack load feature is parsed, the abnormal operator combination in the parameter "username=admin' OR 1=1-" is extracted, the feature pattern comprising the logical operator "OR", the forever condition "1=1" and the SQL annotation "-" is identified, the context metadata including the source IP (192.168.1.105), the target port (443), the protocol type (HTTPS), the timestamp (2025-03-15T 14:22:35+08:00) is extracted, and the structured data is mapped in the security knowledge base.

Further, the structured data may be matched to a CWE knowledge base. For example, the semantic matching engine of the CWE knowledge base may be used to establish a strong association of "SQL_injection" attack types with vulnerability entries CWE-89 (neutral errors of special elements used in SQL commands) while associating vulnerability entries CWE-943 (data neutral deletions in data query logic).

Further, according to OWASP Top-2021 classification standards, the flow detection result can be mapped to an A03:2021 (injection vulnerability) category, and corresponding risk characteristics can be extracted. For example, risk features may be attack complexity low (availability without authentication), potential impact high risk (potentially leading to complete leakage of the database), lack of defense mechanisms, no parameterized query usage detected. Furthermore, threat levels corresponding to the flow detection results can be automatically evaluated and determined according to the model, and the threat levels can be determined by technicians.

For example, the corresponding processing scheme may be automatically generated based on vulnerability entries, risk features, threat levels, etc. corresponding to the traffic detection results. In particular, the treatment scheme may include a three-level response strategy including immediate treatment measures, repair suggestions, priority treatments. For example, the immediate disposition measure may be to dynamically insert a protection rule through the WAF (e.g., real-time blocking of POST requests containing consecutive special characters (' - -)), activate a request redirection mechanism, direct attack traffic to the honeypot system for behavior tracking, etc. Further, the repair suggestions may be at the code layer (e.g., enforcing the use of prependstate at the DAO layer, performing whitelist checking on user input, etc.), at the configuration layer (e.g., adding SQLi feature canonical filter rules in the nginnx configuration, etc.), and at the architecture layer (e.g., suggesting deployment of database firewalls to perform SQL syntax tree analysis, etc.), etc.

Further, priority handling may include generating a severity security event worksheet, automatically pushing vulnerability details to related systems, sending threat intelligence to a secure operations center (Security Operations Center, SOC) platform, and so forth.

By the method, the attack characteristics can be rapidly and accurately extracted, the corresponding vulnerability entry and threat level can be determined, so that the vulnerability identification and threat assessment efficiency is improved, the automation and intelligent level of safety operation is enhanced, a safety team can rapidly respond and process high-risk safety events, the safety operation flow is optimized, and the overall network safety protection capability is improved.

Referring to fig. 4, a general embodiment of the present application will be described with reference to fig. 4. Illustratively, 1 may capture and collect original network traffic packets (PCAP files) from a network environment, and decode and reassemble the network traffic packets to extract key information such as HTTP requests and responses, and convert the binary data into a readable text format.

Further, data related to network attack detection, such as attack load, normal traffic, attack traffic samples, vulnerability codes, etc., may be collected from the captured network traffic data packets, and the collected data may be cleaned and preprocessed, which may specifically include removing irrelevant traffic, decoding TCP streams, extracting HTTP key information, etc., and performing data deduplication and formatting processing to construct a high quality pre-training data set.

Further, a plurality of seed instructions corresponding to a plurality of network attack categories can be generated according to the flow to be detected of each second sample, and the seed instructions are expanded to obtain the flow to be detected of the plurality of second samples, a plurality of corresponding sample detection instructions, and a plurality of corresponding sample flow detection results, so as to generate a fine-tuning data set.

Furthermore, based on the pre-training data set, the first preset large language model can be trained for the second time to obtain a second preset large language model, so that the model can fully learn the flow mode, and accuracy of network attack flow detection is improved. And then, fine tuning the second preset large language model according to the fine tuning data set so that the model has the capability of accurately aligning the questions with the answers, and finally, the large language model is obtained.

Furthermore, the flow to be detected can be input into a large language model to obtain a corresponding flow detection result. In order to improve the overall capability of network security protection, the obtained flow detection result can be mapped in a security knowledge base to determine corresponding vulnerability entries and threat levels, and a specific processing scheme is generated according to the vulnerability entries and the threat levels, including generating a security event work order, pushing vulnerability details to related systems, sending threat information to an SOC platform and the like, so as to provide detailed threat analysis and response suggestions.

Referring to fig. 5, the embodiment of the present application further provides a network attack traffic detection device, which can implement the above network attack traffic detection method, where the network attack traffic detection device includes:

the obtaining module 51 is configured to obtain a flow to be detected, and a detection instruction generated according to the flow to be detected;

the input module 52 is configured to input the flow to be detected and the detection instruction into the large language model, so as to obtain a flow detection result corresponding to the flow to be detected;

The specific implementation of the network attack traffic detection device is basically the same as the specific embodiment of the network attack traffic detection method, and will not be described herein. On the premise of meeting the requirements of the embodiment of the application, other functional modules can be arranged in the network attack flow detection device so as to realize the network attack flow detection method in the embodiment.

The embodiment of the application also provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the network attack flow detection method when executing the computer program. The computer equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

Referring to fig. 6, fig. 6 illustrates a hardware structure of a computer device according to another embodiment, where the computer device includes:

The processor 61 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the embodiments of the present application;

The memory 62 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 62 may store an operating system and other application programs, and when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 62, and the processor 61 invokes the network attack traffic detection method for executing the embodiments of the present disclosure;

An input/output interface 63 for implementing information input and output;

the communication interface 64 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (such as USB, network cable, etc.), or may implement communication in a wireless manner (such as mobile network, WIFI, bluetooth, etc.);

A bus 65 for transferring information between the various components of the device (e.g., processor 61, memory 62, input/output interface 63, and communication interface 64);

Wherein the processor 61, the memory 62, the input/output interface 63 and the communication interface 64 are in communication connection with each other inside the device via a bus 65.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program which realizes the network attack flow detection method when being executed by a processor.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" and "a plurality" are one or more, and "a plurality" is two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like is any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the above elements is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. The storage medium includes various media capable of storing programs, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method for detecting network attack traffic, the method comprising:

2. The network attack traffic detection method according to claim 1, wherein the large language model is trained by:

3. The method for detecting network attack traffic according to claim 2, wherein said recursively predicting the traffic to be detected based on the first sample by the first preset large language model to obtain predicted traffic to be detected includes:

4. The method for detecting network attack traffic according to claim 2, wherein training the first preset large language model based on the first loss to obtain a second preset large language model includes:

5. The method for detecting network attack traffic according to claim 2, wherein the obtaining the first sample traffic to be detected from the preset pre-training data set, and performing recursion prediction based on the first sample traffic to be detected through the first preset large language model, before obtaining the predicted traffic to be detected, further comprises:

6. The method for detecting network attack traffic according to claim 2, wherein the obtaining the second sample to-be-detected traffic, the corresponding sample traffic detection result, and the plurality of sample detection instructions corresponding to the second sample to-be-detected traffic from the preset fine tuning data set, and inputting the second sample to-be-detected traffic and each sample detection instruction into the second preset large language model in sequence, before obtaining the predicted traffic detection result, further includes:

Acquiring a plurality of preset instruction types;

7. The method for detecting network attack traffic according to claim 1, wherein after inputting the traffic to be detected and the detection instruction into a large language model to obtain a traffic detection result corresponding to the traffic to be detected, further comprises:

8. A network attack traffic detection device, the device comprising:

9. A computer device, characterized in that it comprises a memory storing a computer program and a processor implementing the network attack traffic detection method according to any of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the network attack traffic detection method according to any of claims 1 to 7.