CN114338436B - Network traffic file identification method and device, electronic equipment and medium - Google Patents
Network traffic file identification method and device, electronic equipment and medium Download PDFInfo
- Publication number
- CN114338436B CN114338436B CN202111632893.8A CN202111632893A CN114338436B CN 114338436 B CN114338436 B CN 114338436B CN 202111632893 A CN202111632893 A CN 202111632893A CN 114338436 B CN114338436 B CN 114338436B
- Authority
- CN
- China
- Prior art keywords
- protocol
- file
- network flow
- data
- network traffic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the application discloses a network traffic file identification method, a network traffic file identification device, electronic equipment and a computer readable storage medium. And analyzing the acquired network flow file according to a set protocol analysis mode to obtain the protocol information of the network flow file. The distribution of traffic data has a strong correlation with the protocol. And matching the protocol information with a set service database to determine the duty ratio of the service data in the network flow file. And under the condition that the duty ratio is larger than or equal to a preset threshold value, indicating that the network flow file contains a large amount of service data, and determining that the network flow file is a valid file. In the technical scheme, based on the strong correlation between the service data and the protocol, the protocol information contained in the network traffic file is analyzed, so that whether the network traffic file contains the required service data can be automatically identified, the analysis is not needed, and the identification efficiency of the service data is improved.
Description
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and apparatus for identifying a network traffic file, an electronic device, and a computer readable storage medium.
Background
With the development of new technologies such as 5G (5 th Generation Mobile Communication Technology, fifth generation mobile communication technology) and mobile computing, various internet of things scenes have been rapidly applied in recent years. At the same time of the rapid increase of the number of the devices of the internet of things, the range of influence and the loss caused by the vulnerability are inconceivable. In recent years, a plurality of relevant vulnerabilities and attacks of the Internet of things with a great influence range are disclosed, and the consequences of the relevant vulnerabilities and attacks affect hundreds of millions of Internet of things devices. Therefore, under the background of frequent occurrence of the attack event of the internet of things, it is very necessary to identify, analyze and comb the internet of things devices, especially, the fine granularity of the internet of things devices is divided, so that security measures can be better made for different devices and different vulnerabilities, and the range of the attack of the internet of things devices is reduced.
In the aspect of equipment identification of the Internet of things, equipment is captured and packaged mainly through a collection person going to a customer site, network flow files passing through the equipment are obtained and then transmitted to an analysis person, the analysis person analyzes the flows through a plurality of special analysis platforms or tools, and equipment fingerprints are extracted to mark the equipment. Each class of devices, or each class of device model, has its unique device fingerprint. The device fingerprint is generally present in the service data in the network traffic file, and when the captured network traffic file does not include the service data, the analyst cannot analyze the device fingerprint extracted from the device. Therefore, judging whether the network traffic file contains service data is an important premise for device identification.
At present, a manual analysis mode is often adopted to determine whether the network traffic file contains service data for the identification of the network traffic file. In practical application, the collector collects the device network flow file on site, and then returns to the analyzer, which analyzes to find that no service data is fed back to the collector, and the collector goes to site to collect again. This method is very time-consuming and labor-consuming. Because the analysis platform and the tools are not special for the on-site collection of the clients, the platform and the tools are used in the company of the analyst, and the analysis process is not a short-time process, if the analyst directly goes to the site to collect and analyze at the same time, the workload of the analyst is increased, and the analysis platform and the tools are not suitable for the on-site collection and analysis of the analyst.
Therefore, how to improve the recognition efficiency of the service data is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
An object of an embodiment of the present application is to provide a method, an apparatus, an electronic device, and a computer readable storage medium for identifying a network traffic file, which can improve the identification efficiency of service data.
In order to solve the above technical problems, an embodiment of the present application provides a method for identifying a network traffic file, including:
analyzing the acquired network flow file according to a set protocol analysis mode to obtain protocol information of the network flow file;
matching the protocol information with a set service database, and determining the duty ratio of service data in the network flow file;
And under the condition that the duty ratio is larger than or equal to a preset threshold value, determining the network flow file as an effective file.
Optionally, the analyzing the acquired network traffic file according to the set protocol analysis mode to obtain the protocol information of the network traffic file includes:
extracting session information of each data packet in the network flow file; wherein the session information includes a protocol name;
and summarizing the field contents corresponding to the same protocol name in the network flow file according to the field types corresponding to each protocol type.
Optionally, the service database includes a protocol white list composed of protocols to which the service data belong;
The matching of the protocol information with a set service database, and the determination of the duty ratio of the service data in the network flow file comprises the following steps:
screening out a target protocol name matched with the protocol white list from the network flow file;
Summarizing the data packets corresponding to the target protocol names to obtain the number of the data packets matched with the protocol white list;
and determining the duty ratio of service data in the network flow file based on the number of the data packets and the total number of the data packets of the network flow file.
Optionally, the method further comprises:
And setting a port blacklist according to port information corresponding to the data packet which is not matched with the protocol whitelist under the condition that the duty ratio is smaller than a preset threshold value.
Optionally, the service database further comprises a service data blacklist consisting of non-service data;
after determining the number of the data packets matched with the protocol white list according to the protocol names corresponding to the protocol information, the method further comprises the following steps:
Screening target data packets which are not matched with the business data blacklist from the data packets matched with the protocol white list;
And taking the number of the target data packets as the final data packet number.
Optionally, the service database further includes an IP blacklist composed of IPs of non-service data;
Screening and selecting data packets matched with the protocol white list the service data blacklist unmatched target data packet further comprises the following steps:
screening target data packets which are not matched with the IP blacklist from the target data packets;
the step of taking the number of the target data packets as the final data packet number comprises the following steps:
And taking the number of the target data packets which are not matched with the IP blacklist as the final data packet number.
Optionally, after the determining that the network traffic file is a valid file, the method further includes:
Generating an identification report of the network traffic file; the identification report comprises protocol names contained in the network flow file, field contents corresponding to the same protocol names and/or the duty ratio of service data in the network flow file.
The embodiment of the application also provides a device for identifying the network flow file, which comprises an analysis unit, a matching unit and a determination unit;
The analysis unit is used for analyzing the acquired network flow file according to a set protocol analysis mode so as to obtain the protocol information of the network flow file;
the matching unit is used for matching the protocol information with a set service database and determining the duty ratio of service data in the network flow file;
The determining unit is configured to determine that the network traffic file is an effective file when the duty ratio is greater than or equal to a preset threshold.
Optionally, the parsing unit includes an extracting subunit and a summarizing subunit;
the extraction subunit is used for extracting session information of each data packet in the network flow file; wherein the session information includes a protocol name;
And the summarizing subunit is used for summarizing the field contents corresponding to the same protocol name in the network flow file according to the field category corresponding to each protocol type.
Optionally, the service database includes a protocol white list composed of protocols to which the service data belong; the matching unit comprises a screening subunit, a summarizing subunit and a determining subunit;
the screening subunit is used for screening out a target protocol name matched with the protocol white list from the network flow file;
the summarizing subunit is configured to summarize the data packets corresponding to the target protocol name, so as to obtain the number of data packets matched with the protocol white list;
The determining subunit is configured to determine a duty ratio of service data in the network traffic file based on the number of data packets and the total number of data packets of the network traffic file.
Optionally, the device further comprises a setting unit;
the setting unit is configured to set a port blacklist according to port information corresponding to a data packet that is not matched with the protocol whitelist when the duty ratio is smaller than a preset threshold.
Optionally, the service database further comprises a service data blacklist consisting of non-service data; the device also comprises a data screening unit;
the data screening unit is used for screening target data packets which are not matched with the business data blacklist from data packets which are matched with the protocol whitelist;
and the unit is used for taking the number of the target data packets as the final data packet number.
Optionally, the service database further includes an IP blacklist composed of IPs of non-service data; the device also comprises an IP screening unit;
the IP screening unit is used for screening target data packets which are not matched with the IP blacklist from the target data packets;
and the unit is used for taking the number of the target data packets which are not matched with the IP blacklist as the final data packet number.
Optionally, the device further comprises a generating unit;
The generating unit is used for generating an identification report of the network traffic file after the network traffic file is determined to be a valid file; the identification report comprises protocol names contained in the network flow file, field contents corresponding to the same protocol names and/or the duty ratio of service data in the network flow file.
The embodiment of the application also provides electronic equipment, which comprises:
A memory for storing a computer program;
and a processor for executing the computer program to implement the steps of the network traffic file identification method as described above.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the network traffic file identification method when being executed by a processor.
According to the technical scheme, the acquired network flow file is analyzed according to a set protocol analysis mode so as to obtain the protocol information of the network flow file; the protocol information reflects the protocol used by each packet in the network traffic file. The distribution of traffic data has a strong correlation with the protocol. The service data are often contained in data packets corresponding to certain specific protocols, so that the service database can be constructed based on the protocols, protocol information is matched with the set service database, and the duty ratio of the service data in the network flow file is determined. When the duty ratio is greater than or equal to the preset threshold value, the network traffic file is determined to be an effective file, and the network traffic file is determined to be an effective file. In the technical scheme, based on the strong correlation between the service data and the protocol, the protocol information contained in the network traffic file is analyzed, so that whether the network traffic file contains the required service data can be automatically identified, the analysis is not needed, and the identification efficiency of the service data is improved.
Drawings
For a clearer description of embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
Fig. 1 is a schematic view of a scenario for identifying a network traffic file according to an embodiment of the present application;
Fig. 2 is a flowchart of a method for identifying a network traffic file according to an embodiment of the present application;
Fig. 3 is a schematic structural diagram of an identification device for network traffic files according to an embodiment of the present application;
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present application.
The terms "comprising" and "having" and any variations thereof in the description and claims of the application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.
In order to better understand the aspects of the present application, the present application will be described in further detail with reference to the accompanying drawings and detailed description.
The traffic data generated by different types of devices may vary, and fingerprint information may be generated based on the traffic data. The fingerprint information has uniqueness, and different devices can be identified based on the fingerprint information. The key to fingerprint information generation is that the captured network traffic file needs to contain a large amount of service data. At present, the identification of service data mainly depends on manual analysis, but the manual analysis efficiency is lower, and the manual analysis is easy to be wrong for network traffic files with larger data volume.
Accordingly, the embodiment of the application provides a network traffic file identification method, a network traffic file identification device, electronic equipment and a computer readable storage medium. In view of the strong correlation between service data and protocols, service data is often included in data packets corresponding to certain specific protocols, so that in the present application, a service database can be constructed based on the protocols. According to the strong correlation between the service data and the protocol, the protocol information contained in the network flow file is analyzed, so that whether the network flow file contains the required service data can be automatically identified.
Fig. 1 is a schematic view of a scenario for identifying a network traffic file according to an embodiment of the present application, in order to implement automatic analysis of the network traffic file, a service database may be pre-constructed at a server, where the service database may be constructed based on protocol information having a strong correlation with service data. The network flow file of the equipment can be obtained by a packet capturing mode. In order to realize the identification of the service data in the network traffic file based on the protocol, the acquired network traffic file can be analyzed according to a set protocol analysis mode so as to obtain the protocol information of the network traffic file. And matching the protocol information with a set service database to determine the duty ratio of the service data in the network flow file. And under the condition that the duty ratio is larger than or equal to a preset threshold value, indicating that the network flow file contains a large amount of service data, and determining that the network flow file is an effective file. The implementation mode can automatically identify whether the network traffic file contains the required service data, does not need to analyze manually, and improves the identification efficiency and accuracy of the service data.
Next, a method for identifying a network traffic file provided by the embodiment of the present application is described in detail.
Fig. 2 is a flowchart of a method for identifying a network traffic file according to an embodiment of the present application, where the method includes:
S201: and analyzing the acquired network flow file according to a set protocol analysis mode to obtain the protocol information of the network flow file.
In the embodiment of the application, the strong correlation between the service data and the protocol is considered. For example, in a specific medical client network, traffic data typically occurs in the HTTP protocol (Hyper Text Transfer Protocol ) of the application layer, the DICOM (DIGITAL IMAGING AND Communications IN MEDICINE ) protocol specific to the medical field, and for some proprietary unknown protocols, it is transmitted through TCP (Transmission Control Protocol ) and UDP (User Datagram Protocol, user datagram protocol) of the transport layer, so traffic data may also be contained in the communication content of the unidentified application layer protocols in TCP and UDP. But no traffic data will occur in DNS (Domain NAME SYSTEM) protocol in the application layer, ICMP protocol (Internet Control Message Protocol, control message protocol) in the network layer, etc.
It is thus possible to identify whether the network traffic file contains traffic data or not based on the protocol information contained in the network traffic file.
A network traffic file often contains a plurality of packets, each packet having its corresponding one of the protocol types. In order to facilitate the subsequent evaluation of the amount of data of the service data contained in the network traffic file, the data contained in the network traffic file may be classified and summarized according to the protocol type.
In a specific implementation, session information of each data packet in the network traffic file may be extracted. The session information may be quintuple information of the data packet, including a source IP, a destination IP, a source port, a destination port, and a protocol name.
The different protocol types differ in their corresponding field types. For a protocol containing service data, the field content corresponding to each field type under the protocol can be often used as service data.
After extracting the session information of each data packet in the network traffic file, the field contents corresponding to the same protocol name in the network traffic file can be summarized according to the field category corresponding to each protocol type.
The PCAP file is a common datagram storage format, and the grabbed network traffic file is often presented in the form of a PCAP packet, and the overall structure of the PCAP packet is in the form of a file header-data packet header 1-data packet header 2-data packet 2 and the like. Traffic data may be parsed from the PCAP package. Each PCAP packet often includes a plurality of data packets, and five-tuple information corresponding to each data packet may be extracted from the PCAP packet to obtain a source IP, a destination IP, a source port, a destination port, and a protocol name. Then, the length of the data packet is firstly judged by analyzing hexadecimal information of the data packet, then the starting position and the ending position of the data packet are found, the content is converted into readable text content by utilizing a hexadecimal conversion tool, the analysis is continuously circulated, and finally field contents of the same kind of protocols, namely the same protocol name, are integrated together.
S202: and matching the protocol information with a set service database to determine the duty ratio of the service data in the network flow file.
In embodiments of the present application, a business database may be built based on a protocol. For example, the service data contained in the historical network traffic file may be analyzed, and the protocol names to which the service data belongs may be summarized, so that a protocol whitelist may be obtained. In practical application, as the number of historical network traffic files increases, new protocol names to which service data belong may appear, so that in practical application, the protocol names contained in the protocol white list may be continuously adjusted and updated.
Each data packet has its corresponding protocol information. By analyzing the network flow file, the data packet corresponding to each protocol name can be obtained.
When the duty ratio of service data in the network flow file is evaluated, a target protocol name matched with a protocol white list can be screened out from the network flow file; summarizing the data packets corresponding to the target protocol names to obtain the number of the data packets matched with the protocol white list; and determining the duty ratio of the service data in the network flow file based on the number of the data packets and the total number of the data packets of the network flow file.
For example, assume that the network traffic file contains 100 packets, the 100 packets corresponding to a total of 10 protocol types, and 10 packets for each protocol type. The number of data packets matched with the protocol white list is 10×8=80, the total number of data packets of the network flow file is 100, and the ratio of service data in the network flow file can be determined to be 80/100=80%.
S203: and under the condition that the duty ratio is larger than or equal to a preset threshold value, determining the network flow file as a valid file.
The higher the duty cycle of the traffic data in the network traffic file, the more traffic data that is present in the network traffic file. Only when the network traffic file contains a large amount of business data, the value of asset identification is realized. Therefore, in the embodiment of the application, the threshold value can be set to evaluate the duty ratio of the service data in the network traffic file.
And under the condition that the duty ratio is larger than or equal to a preset threshold value, the network flow file is indicated to contain a large amount of service data, and the network flow file has the value of subsequent analysis, and can be determined to be an effective file at the moment.
The value of the preset threshold can be set based on actual demands, and when the quality requirement on the network flow file is higher, the value of the preset threshold can be set higher; when the quality requirement on the network flow file is not high, the value of the preset threshold value can be set lower.
For example, the preset threshold may be set to 80%, and in combination with the above determination that the duty ratio of the service data in the network traffic file is 80% and the duty ratio is equal to the preset threshold, the network traffic file may be determined to be a valid file, and the network traffic file may be used to perform subsequent asset identification.
According to the technical scheme, the acquired network flow file is analyzed according to a set protocol analysis mode so as to obtain the protocol information of the network flow file; the protocol information reflects the protocol used by each packet in the network traffic file. The distribution of traffic data has a strong correlation with the protocol. The service data are often contained in data packets corresponding to certain specific protocols, so that the service database can be constructed based on the protocols, protocol information is matched with the set service database, and the duty ratio of the service data in the network flow file is determined. When the duty ratio is greater than or equal to the preset threshold value, the network traffic file is determined to be an effective file, and the network traffic file is determined to be an effective file. In the technical scheme, based on the strong correlation between the service data and the protocol, the protocol information contained in the network traffic file is analyzed, so that whether the network traffic file contains the required service data can be automatically identified, the analysis is not needed, and the identification efficiency of the service data is improved.
Considering that in practical applications, there may be a case where the duty ratio is smaller than a preset threshold. And under the condition that the duty ratio is smaller than a preset threshold value, the condition that the service data contained in the network flow file is less is indicated. The network traffic file is grabbed by the port, and if the network traffic file is grabbed by the port, the new grabbed network traffic file still contains little service data or little service data.
Therefore, in the embodiment of the application, in order to improve the success rate of capturing the network traffic file, the port blacklist can be set according to the port information corresponding to the data packet which is not matched with the protocol whitelist under the condition that the duty ratio is smaller than the preset threshold value.
Under the condition that the duty ratio is smaller than a preset threshold value, the number of data packets which are not matched with the protocol white list is often more, and the port information corresponding to the data packets is also more, if all the port information is added into the port black list, the data volume acquired by the subsequent network flow files can be greatly influenced. Therefore, in practical application, the ports corresponding to the data packets which are not matched with the protocol white list can be classified and summarized to obtain the number of each port in the network flow file, and the first N ports with the highest number of ports are added to the port black list. The value of N may be set based on actual requirements, for example N may be set to 3.
In practical application, when the port blacklist is set, the protocol blacklist can be set independently besides the port information corresponding to the data packet which is not matched with the protocol blacklist. The protocol blacklist may include a protocol name without service data.
And under the condition that the duty ratio is smaller than a preset threshold value, comparing the protocol name contained in the network flow file with the protocol blacklist, thereby setting the port blacklist according to the port information corresponding to the data packet matched with the protocol blacklist.
By setting the port blacklist, when the network traffic files are grabbed later, the ports contained in the port blacklist can be avoided, namely the network traffic files are not grabbed from the ports, so that the quality of the network traffic files is improved, and the grabbed network traffic files contain service data as much as possible.
In practical applications, the network traffic file may contain some data that is useless for asset identification, such as intranet data in an enterprise, which is useless for asset identification analysis. The protocol names to which these data belong may exist in the protocol whitelist, affecting the duty cycle of the traffic data in the network traffic file.
Therefore, in the embodiment of the present application, in order to reduce the influence of the useless data on the duty ratio, a service data blacklist may be constructed based on the useless data, that is, the non-service data, after the useless data is obtained. Accordingly, the service database may further include a blacklist of service data consisting of non-service data.
In a specific implementation, the garbage data may be subjected to feature analysis, converted into hexadecimal, and then regular expression rules are generated. And summarizing regular expression rules corresponding to all the useless data to serve as a business data blacklist.
After determining the number of data packets matched with the protocol white list according to the protocol names corresponding to the protocol information, screening target data packets which are not matched with the business data black list from the data packets matched with the protocol white list; the number of the target data packets is taken as the final data packet number.
In addition to setting up a blacklist of service data, IP is a factor that affects service data, and in practical applications, some IPs cannot obtain useful service data, for example, in a medical client scenario, some IPs are scanner IPs, and network traffic flowing through a scanner may simultaneously include service data with multiple devices, which may interfere with asset identification analysis. Therefore, in the embodiment of the application, the IP blacklist can be formed based on the IP of the non-service data. Accordingly, the traffic database may also include an IP blacklist consisting of the IP of the non-traffic data.
After screening out target data packets which are not matched with the business data blacklist from data packets matched with the protocol white list, screening out target data packets which are not matched with the IP blacklist from the target data packets; and taking the number of the target data packets which are not matched with the IP blacklist as the final data packet number.
In the embodiment of the present application, the order of comparing the data packet with the service data blacklist and the IP blacklist is not limited. In the above description, the execution sequence of selecting the target data packet which is not matched with the service data blacklist from the data packets which are matched with the protocol whitelist is taken as an example. In practical application, a target data packet which is not matched with the IP blacklist can be screened out from data packets which are matched with the protocol whitelist, and then the target data packet which is not matched with the business data blacklist is screened out from the target data packets.
By setting the service data blacklist and the IP blacklist, the data content contained in the network flow file can be further analyzed on the basis of analyzing the protocol type contained in the network flow file, so that the data packet without service data is discarded, the influence of the data packet without service data on the calculation of the duty ratio is avoided, the accuracy of the duty ratio is improved, and whether the network flow file can be used as an effective file can be accurately estimated.
In the embodiment of the application, when the network flow file is analyzed, the data contained in the network flow file can be classified and summarized according to the protocol type. In order to facilitate the follow-up operators to know the distribution situation of the service data in the network traffic file, an identification report of the network traffic file can be generated after the network traffic file is determined to be a valid file; the identification report may include a protocol name included in the network traffic file, field contents corresponding to the same protocol name, and/or a duty ratio of service data in the network traffic file.
By generating the identification report of the network flow file, the data obtained by analysis can be intuitively displayed in a list form, so that an operator can directly know the distribution condition of the service data in the network flow file.
Fig. 3 is a schematic structural diagram of an identification device for network traffic files according to an embodiment of the present application, which includes an analysis unit 31, a matching unit 32, and a determination unit 33;
The parsing unit 31 is configured to parse the acquired network traffic file according to a set protocol parsing manner, so as to obtain protocol information of the network traffic file;
A matching unit 32, configured to match the protocol information with a set service database, and determine a duty ratio of service data in the network traffic file;
a determining unit 33, configured to determine that the network traffic file is a valid file if the duty ratio is greater than or equal to a preset threshold.
Optionally, the parsing unit includes an extracting subunit and a summarizing subunit;
the extraction subunit is used for extracting session information of each data packet in the network flow file; wherein the session information includes a protocol name;
and the summarizing subunit is used for summarizing the field contents corresponding to the same protocol name in the network flow file according to the field categories corresponding to each protocol type.
Optionally, the service database includes a protocol white list composed of protocols to which the service data belong; the matching unit comprises a screening subunit, a summarizing subunit and a determining subunit;
the screening subunit is used for screening out the target protocol name matched with the protocol white list from the network flow file;
the summarizing subunit is used for summarizing the data packets corresponding to the target protocol names to obtain the number of the data packets matched with the protocol white list;
and the determining subunit is used for determining the duty ratio of the service data in the network flow file based on the number of the data packets and the total number of the data packets of the network flow file.
Optionally, the device further comprises a setting unit;
and the setting unit is used for setting a port blacklist according to the port information corresponding to the data packet which is not matched with the protocol whitelist under the condition that the duty ratio is smaller than a preset threshold value.
Optionally, the service database further includes a service data blacklist composed of non-service data; the device also comprises a data screening unit;
The data screening unit is used for screening target data packets which are not matched with the business data blacklist from the data packets which are matched with the protocol whitelist;
and the unit is used for taking the number of the target data packets as the final data packet number.
Optionally, the service database further includes an IP blacklist consisting of IPs of non-service data; the device also comprises an IP screening unit;
The IP screening unit is used for screening target data packets which are not matched with the IP blacklist from the target data packets;
And the unit is used for taking the number of the target data packets which are not matched with the IP blacklist as the final data packet number.
Optionally, the device further comprises a generating unit;
the generating unit is used for generating an identification report of the network traffic file after the network traffic file is determined to be a valid file; the identification report comprises protocol names contained in the network flow files, field contents corresponding to the same protocol names and/or the duty ratio of service data in the network flow files.
The description of the features in the embodiment corresponding to fig. 3 may be referred to the related description of the embodiment corresponding to fig. 2, which is not repeated here.
According to the technical scheme, the acquired network flow file is analyzed according to a set protocol analysis mode so as to obtain the protocol information of the network flow file; the protocol information reflects the protocol used by each packet in the network traffic file. The distribution of traffic data has a strong correlation with the protocol. The service data are often contained in data packets corresponding to certain specific protocols, so that the service database can be constructed based on the protocols, protocol information is matched with the set service database, and the duty ratio of the service data in the network flow file is determined. When the duty ratio is greater than or equal to the preset threshold value, the network traffic file is determined to be an effective file, and the network traffic file is determined to be an effective file. In the technical scheme, based on the strong correlation between the service data and the protocol, the protocol information contained in the network traffic file is analyzed, so that whether the network traffic file contains the required service data can be automatically identified, the analysis is not needed, and the identification efficiency of the service data is improved.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present application, as shown in fig. 4, the electronic device includes: a memory 20 for storing a computer program;
A processor 21 for implementing the steps of the method for identifying a network traffic file according to the above embodiment when executing a computer program.
The electronic device provided in this embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 21 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 21 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 21 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing a computer program 201, where the computer program, when loaded and executed by the processor 21, can implement the relevant steps of the network traffic file identification method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may further include an operating system 202, data 203, and the like, where the storage manner may be transient storage or permanent storage. Operating system 202 may include Windows, unix, linux, among other things. The data 203 may include, but is not limited to, protocol information, a business database, and the like.
In some embodiments, the electronic device may further include a display 22, an input-output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.
Those skilled in the art will appreciate that the structure shown in fig. 4 is not limiting of the electronic device and may include more or fewer components than shown.
It will be appreciated that if the network traffic file identification method in the above embodiment is implemented in the form of a software functional unit and sold or used as a separate product, it may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in part or in whole or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrically erasable programmable ROM, registers, a hard disk, a removable disk, a CD-ROM, a magnetic disk, or an optical disk, etc., which can store program codes.
Based on this, the embodiment of the invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the network traffic file identification method as described above.
The functions of each functional module of the computer readable storage medium according to the embodiments of the present invention may be specifically implemented according to the method in the embodiments of the method, and the specific implementation process may refer to the relevant description of the embodiments of the method, which is not repeated herein.
The method, the device, the electronic equipment and the computer readable storage medium for identifying the network traffic file provided by the embodiment of the application are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method, the device, the electronic equipment and the computer readable storage medium for identifying the network traffic file provided by the application are described in detail. The principles and embodiments of the present application have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present application and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.
Claims (8)
1. A method for identifying a network traffic file, comprising:
analyzing the acquired network flow file according to a set protocol analysis mode to obtain protocol information of the network flow file; the service database comprises a protocol white list formed by protocols to which the service data belong; the business database also comprises a business data blacklist composed of non-business data;
matching the protocol information with a set service database, and determining the duty ratio of the service data in the network flow file comprises the following steps: screening target data packets which are not matched with the business data blacklist from the data packets matched with the protocol white list; determining the duty ratio of service data in the network flow file based on the number of the target data packets and the total number of the data packets of the network flow file;
under the condition that the duty ratio is larger than or equal to a preset threshold value, determining the network flow file as an effective file;
Setting a port blacklist according to port information corresponding to the data packet which is not matched with the protocol whitelist under the condition that the duty ratio is smaller than a preset threshold value; classifying and summarizing ports corresponding to data packets which are not matched with the protocol white list to obtain the number of each port in the network flow file, and adding the first N ports with the highest port number to the port black list; and avoiding the ports contained in the port blacklist, and not grabbing the network flow files from the ports contained in the port blacklist.
2. The method for identifying a network traffic file according to claim 1, wherein the analyzing the acquired network traffic file according to the set protocol analysis mode to obtain the protocol information of the network traffic file includes:
extracting session information of each data packet in the network flow file; wherein the session information includes a protocol name;
and summarizing the field contents corresponding to the same protocol name in the network flow file according to the field types corresponding to each protocol type.
3. The method for identifying a network traffic file according to claim 2, wherein the matching the protocol information with a set service database, and determining the duty ratio of the service data in the network traffic file comprises:
screening out a target protocol name matched with the protocol white list from the network flow file;
Summarizing the data packets corresponding to the target protocol names to obtain the number of the data packets matched with the protocol white list.
4. The method of claim 1, wherein the traffic database further comprises an IP blacklist consisting of IPs of non-traffic data;
Screening and selecting data packets matched with the protocol white list the service data blacklist unmatched target data packet further comprises the following steps:
screening target data packets which are not matched with the IP blacklist from the target data packets;
the step of taking the number of the target data packets as the final data packet number comprises the following steps:
And taking the number of the target data packets which are not matched with the IP blacklist as the final data packet number.
5. The method of claim 2, further comprising, after said determining that said network traffic file is a valid file:
Generating an identification report of the network traffic file; the identification report comprises protocol names contained in the network flow file, field contents corresponding to the same protocol names and/or the duty ratio of service data in the network flow file.
6. The device for identifying the network flow file is characterized by comprising an analysis unit, a matching unit and a determination unit;
The analysis unit is used for analyzing the acquired network flow file according to a set protocol analysis mode so as to obtain the protocol information of the network flow file; the service database comprises a protocol white list formed by protocols to which the service data belong; the business database also comprises a business data blacklist composed of non-business data;
The matching unit is configured to match the protocol information with a set service database, and determining a duty ratio of service data in the network traffic file includes: screening target data packets which are not matched with the business data blacklist from the data packets matched with the protocol white list; determining the duty ratio of service data in the network flow file based on the number of the target data packets and the total number of the data packets of the network flow file;
The determining unit is used for determining that the network flow file is an effective file under the condition that the duty ratio is larger than or equal to a preset threshold value;
The device also comprises a setting unit;
The setting unit is configured to set a port blacklist according to port information corresponding to a data packet that is not matched with the protocol whitelist when the duty ratio is smaller than a preset threshold; classifying and summarizing ports corresponding to data packets which are not matched with the protocol white list to obtain the number of each port in the network flow file, and adding the first N ports with the highest port number to the port black list; and avoiding the ports contained in the port blacklist, and not grabbing the network flow files from the ports contained in the port blacklist.
7. An electronic device, comprising:
A memory for storing a computer program;
A processor for executing the computer program to perform the steps of the method for identifying a network traffic file according to any of claims 1 to 5.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method for identifying network traffic files according to any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111632893.8A CN114338436B (en) | 2021-12-28 | 2021-12-28 | Network traffic file identification method and device, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111632893.8A CN114338436B (en) | 2021-12-28 | 2021-12-28 | Network traffic file identification method and device, electronic equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114338436A CN114338436A (en) | 2022-04-12 |
CN114338436B true CN114338436B (en) | 2024-08-16 |
Family
ID=81014698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111632893.8A Active CN114338436B (en) | 2021-12-28 | 2021-12-28 | Network traffic file identification method and device, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114338436B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104348811A (en) * | 2013-08-05 | 2015-02-11 | 深圳市腾讯计算机系统有限公司 | Method and device for detecting attack of DDoS (distributed denial of service) |
CN110011962A (en) * | 2019-02-21 | 2019-07-12 | 国家计算机网络与信息安全管理中心 | A kind of recognition methods of car networking business datum |
CN111277570A (en) * | 2020-01-10 | 2020-06-12 | 中电长城网际系统应用有限公司 | Data security monitoring method and device, electronic equipment and readable medium |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6651099B1 (en) * | 1999-06-30 | 2003-11-18 | Hi/Fn, Inc. | Method and apparatus for monitoring traffic in a network |
EP2819365A1 (en) * | 2013-06-24 | 2014-12-31 | Alcatel Lucent | Network traffic inspection |
US9462014B1 (en) * | 2015-04-23 | 2016-10-04 | Datiphy Inc. | System and method for tracking and auditing data access in a network environment |
US10785111B2 (en) * | 2017-05-22 | 2020-09-22 | Netscout Systems, Inc | Fault-tolerant monitoring of tunneled IP flows |
CN107592303B (en) * | 2017-08-28 | 2020-01-03 | 北京明朝万达科技股份有限公司 | Method and device for extracting outgoing files in high-speed mirror image network traffic |
CN111131070B (en) * | 2019-12-19 | 2023-04-07 | 北京浩瀚深度信息技术股份有限公司 | Port time sequence-based network traffic classification method and device and storage medium |
CN113381962B (en) * | 2020-02-25 | 2023-02-03 | 深信服科技股份有限公司 | Data processing method, device and storage medium |
CN111628941A (en) * | 2020-05-27 | 2020-09-04 | 广东浪潮大数据研究有限公司 | Network traffic classification processing method, device, equipment and medium |
CN111901300B (en) * | 2020-06-24 | 2023-02-03 | 武汉绿色网络信息服务有限责任公司 | A method and device for classifying network traffic |
CN111901327B (en) * | 2020-07-21 | 2022-07-26 | 平安科技(深圳)有限公司 | Cloud network vulnerability mining method and device, electronic equipment and medium |
CN112235160B (en) * | 2020-10-14 | 2022-02-01 | 福建奇点时空数字科技有限公司 | Flow identification method based on protocol data deep layer detection |
CN112350956B (en) * | 2020-10-23 | 2022-07-01 | 新华三大数据技术有限公司 | Network traffic identification method, device, equipment and machine readable storage medium |
CN113746849A (en) * | 2021-09-07 | 2021-12-03 | 深信服科技股份有限公司 | Method, device, equipment and storage medium for identifying equipment in network |
-
2021
- 2021-12-28 CN CN202111632893.8A patent/CN114338436B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104348811A (en) * | 2013-08-05 | 2015-02-11 | 深圳市腾讯计算机系统有限公司 | Method and device for detecting attack of DDoS (distributed denial of service) |
CN110011962A (en) * | 2019-02-21 | 2019-07-12 | 国家计算机网络与信息安全管理中心 | A kind of recognition methods of car networking business datum |
CN111277570A (en) * | 2020-01-10 | 2020-06-12 | 中电长城网际系统应用有限公司 | Data security monitoring method and device, electronic equipment and readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN114338436A (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112995196B (en) | Method and system for processing situation awareness information in network security level protection | |
CA2816069C (en) | Data loss monitoring of partial data streams | |
KR101234326B1 (en) | Distributed traffic analysis | |
CN111277587A (en) | Malicious encrypted traffic detection method and system based on behavior analysis | |
EP2760162B1 (en) | Method and device for detecting rule optimization configuration | |
CN104270392A (en) | A network protocol recognition method and system based on three-classifier cooperative training and learning | |
CN114338600B (en) | Equipment fingerprint selection method and device, electronic equipment and medium | |
CN108900374B (en) | A data processing method and device applied to DPI equipment | |
CN110213124A (en) | Passive operation system identification method and device based on the more sessions of TCP | |
CN112751835A (en) | Traffic early warning method, system, equipment and storage device | |
Canini et al. | GTVS: Boosting the collection of application traffic ground truth | |
Liu et al. | Extending labeled mobile network traffic data by three levels traffic identification fusion | |
CN111314326B (en) | Method, device, equipment and medium for confirming HTTP vulnerability scanning host | |
EP3718284B1 (en) | Extending encrypted traffic analytics with traffic flow data | |
CN114338436B (en) | Network traffic file identification method and device, electronic equipment and medium | |
CN110225009A (en) | It is a kind of that user's detection method is acted on behalf of based on communication behavior portrait | |
Tang et al. | Relational reasoning-based approach for network protocol reverse engineering | |
CN113746849A (en) | Method, device, equipment and storage medium for identifying equipment in network | |
CN110460593B (en) | Network address identification method, device and medium for mobile traffic gateway | |
CN112073364A (en) | DDoS attack identification method, system, equipment and readable storage medium based on DPI | |
CN117375958A (en) | Web application system identification method and device and readable storage medium | |
CN116055587A (en) | Method and device for implementing hierarchical classification of API assets | |
KR100608541B1 (en) | An apparatus for capturing Internet ProtocolIP packet with sampling and signature searching function, and a method thereof | |
US20240220610A1 (en) | Security data processing device, security data processing method, and computer-readable storage medium for storing program for processing security data | |
CN118509337B (en) | Data asset identification method, system, device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |