CN113553589B - Extraction method, device and application of malicious software propagation characteristics - Google Patents
Extraction method, device and application of malicious software propagation characteristics Download PDFInfo
- Publication number
- CN113553589B CN113553589B CN202110870400.8A CN202110870400A CN113553589B CN 113553589 B CN113553589 B CN 113553589B CN 202110870400 A CN202110870400 A CN 202110870400A CN 113553589 B CN113553589 B CN 113553589B
- Authority
- CN
- China
- Prior art keywords
- feature
- malware
- log
- network
- malicious software
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims description 27
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 19
- 238000004590 computer program Methods 0.000 claims description 20
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000006399 behavior Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002155 anti-virotic effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/561—Virus type analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Virology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application provides a method for extracting propagation characteristics of malicious software, which belongs to the technical field of network security and comprises the following steps: acquiring network flow of malicious software on a terminal; parsing the network traffic into a plurality of session logs, wherein the session logs include a plurality of characteristic metrics; dividing the conversation logs into one or more log groups according to log types, and clustering the conversation logs in the log groups according to the characteristic indexes to obtain one or more characteristic classes; and extracting corresponding representative features from each feature class, and generating a malware feature vector according to the representative features of the log group, wherein the log group comprises one or more representative features. The method comprises the steps of obtaining a session log contained in the network flow generated by the malicious software through analyzing the network flow, and clustering according to characteristic indexes contained in the session log to extract a malicious software characteristic vector to measure the propagation behavior of the running malicious software; and a malware feature library can be constructed according to the extracted malware feature vectors for malware detection.
Description
Technical Field
The application relates to the technical field of network security, in particular to a method, a device and an application for extracting malicious software propagation characteristics.
Background
With the increasing popularization of networks, malicious software is utilized to invade a user terminal, and various malicious behaviors are executed more and more rampant day by day, so that a series of security problems such as privacy disclosure of users, economic loss and the like are caused. Therefore, in order to guarantee network security, it becomes important to discover and clean up malware in time.
The traditional malicious software detection device needs to run on a terminal, the detection device usually presets a massive malicious software feature library which comprises static features and dynamic features of malicious software, and when finding that suspicious software conforming to the features in the malicious software feature library exists in the terminal, the detection device determines that the malicious software is malicious software. Later, in order to reduce the performance stress of the terminal, many inspection devices put the detection function on the server, but nevertheless, the acquisition software still needs to be installed on the terminal, and the performance of the terminal is occupied.
Techniques for implementing malware detection without installing any detection software on the terminal are also under continuous research. A common method is clustering, that is, counting network communication data entangled by a large number of terminals, classifying all terminals generating communication data through a common clustering algorithm such as k-means, and regarding a class containing a small number of terminals, it is considered that the terminals contained in the class may run malicious software, but this method has the defects that the accuracy of detection results is very low, and the results can only be used as references, and the specific type and name of the malicious software cannot be accurately identified.
Disclosure of Invention
In a first aspect, an embodiment of the present application provides a method for extracting malware propagation characteristics, where the method obtains a session log included in a network flow generated by malware by analyzing the network flow, and performs clustering according to characteristic indexes included in the session log to extract a malware characteristic vector to measure a propagation behavior of running malware.
Specifically, the method comprises the following steps:
acquiring network flow of malicious software on a terminal;
parsing the network traffic into a plurality of session logs, wherein the session logs include a plurality of characteristic metrics;
dividing the session logs into one or more log groups according to log types, and clustering the session logs in the log groups according to the characteristic indexes to obtain one or more characteristic classes;
and extracting corresponding representative features from each feature class, and generating a malware feature vector according to the representative features of the log group, wherein the log group comprises one or more representative features.
In some embodiments, the generating the malware feature vector according to the representative features of the log group includes: and combining the representative features in each log group to obtain a representative feature set corresponding to each log type, and then forming the malware feature vector by all the representative feature sets according to a specified order, wherein each log group comprises one or more representative features.
In some embodiments of the application, the terminals include one or more terminals; extracting one or more high-value feature classes according to the number of session logs contained in each feature class and the number of terminals covered by each feature class; and generating a corresponding malware characteristic vector according to the characteristic indexes in the high-value characteristic class.
In some application embodiments, the specific way to collect malware is: installing the malware on one or more virtual terminals, wherein the virtual terminals access the same virtualized network; and setting a network mirror image service in the virtualization network, and collecting the network flow of the malicious software.
In particular, in order to enrich the characteristics of the propagation behavior of the malware, the embodiment of the present application not only extracts a single characteristic for the session log itself, but also adds a time dimension statistical characteristic associated with the session log to the malware characteristic vector in some embodiments of the present application.
In a second aspect, the method for extracting the malware propagation characteristics is applied to software detection, and an embodiment of the present application further provides a malware detection method, where the method includes:
acquiring the malicious software feature vector and a target feature vector of target software;
and calculating the similarity of the target feature vector and the malicious software feature vector, and if the similarity exceeds a set threshold, judging that the target software is malicious software.
In a third aspect, an embodiment of the present application is based on the same concept, and further provides an apparatus for extracting malware propagation characteristics, where the apparatus implements the method for extracting malware propagation characteristics, and includes:
the acquisition module is used for acquiring network flow of the malicious software on the terminal;
the analysis module is used for analyzing the network flow into a plurality of session logs, wherein the session logs comprise a plurality of characteristic indexes;
the clustering module is used for dividing the session logs into one or more log groups according to log types and clustering the session logs in the log groups according to the characteristic indexes to obtain one or more characteristic classes;
and the characteristic extraction module is used for generating a malware characteristic vector according to the representative characteristics of the log group, wherein the log group comprises one or more representative characteristics.
In a fourth aspect, the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for extracting the malware propagation characteristics or the method for detecting malware as described in any one of the above application embodiments.
In a fifth aspect, an embodiment of the present application provides a computer program product, where the computer program product includes: a program or instructions which, when run on a computer, causes the computer to perform the method for extracting malware propagation characteristics or the method for detecting malware as described in any of the embodiments of the above applications.
In a sixth aspect, the present application provides a readable storage medium, in which a computer program is stored, where the computer program includes program code for controlling a process to execute a process, where the process includes the extraction method of the malware propagation characteristic according to any one of the above application embodiments or the malware detection method according to any one of the above application embodiments.
According to the extraction method, the device and the application of the malicious software propagation characteristics, the session logs contained in the network traffic generated by the malicious software are obtained by analyzing the network traffic, and the malicious software propagation behaviors in operation are measured by clustering according to the characteristic indexes contained in the session logs to extract the malicious software characteristic vectors.
The embodiments of the present application are characterized by three points: firstly, the propagation behavior of the malicious software is measured by not simply using the clustering result, but comprehensively considering the number of session logs contained in the feature class and the number of terminals covered by the feature class, and selecting a high-value feature class from the session logs to measure; secondly, not only extracting the propagation characteristics aiming at the session log, but also adding the time dimension statistical characteristics associated with the session log to measure the propagation behavior of the malicious software; thirdly, the device for operating the extraction method of the embodiment of the application can be deployed on the server, only the network flow of the target software needs to be obtained and processed, the performance resource of the terminal does not need to be occupied, and the performance pressure of the terminal is greatly reduced.
It is worth mentioning that according to the malware propagation feature extraction method provided by the embodiment of the application, a large number of malware feature vectors corresponding to malware can be extracted to construct a malware feature library for malware detection.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of a malware propagation feature extraction method according to an embodiment of the present application;
FIG. 2 is a block diagram of a malware propagation feature extraction device according to an embodiment of the present application;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims that follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the methods may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Implement method
The embodiment provides a method for extracting the spread characteristics of the malicious software.
Referring to fig. 1, fig. 1 is a flowchart of a method for extracting malware propagation characteristics according to an embodiment of the present application. As shown in FIG. 1, the method includes steps S1-S4:
step S1: and acquiring network traffic of the malicious software on the terminal.
In the step, a virtualization network is established through a virtualization means, the virtualization network comprises one or more virtual terminals which are installed with a current common operating system, a DHCP server and a DNS server, the equipment is accessed into a unified virtual network, network mirror image service is set, and network traffic of all equipment in the virtual network is guided to a traffic collection server. Among them, there are many software platforms for building a virtualized network, such as openStack of instructions.
And then, running a single piece of malicious software on any virtual terminal in the virtualization network, ensuring that no other unrelated programs are run on the virtual terminal before running, and closing the firewall. Specifically, the malware may be set to run automatically after the operating system is started on the virtual terminal, and run for a short time.
And capturing the network traffic generated by the virtual network after the malicious software runs on a traffic collection server. It should be noted that the above method for acquiring network traffic of malware is only used as an example, and the source of network traffic of malware in the present application is not limited herein. That is to say, the network traffic of the mobile phone malware in this step may be, as described above, traffic captured after running the malware in the virtual terminal, and in actual use, the network traffic of the malware collected in this step may also be network traffic given to the malware directly collected in the internet, or may also be obtained in other manners.
Step S2: parsing the network traffic into a plurality of session logs, wherein the session logs include a plurality of characteristic metrics.
The network traffic collected in step S1 is parsed into a session log, which includes at least a session log of the TCP/IP transport layer and may also include a session log of the application layer. Specifically, the session logs obtained by analysis at least include 3 types of session logs in TCP, UDP, DNS, DHCP, and HTTP, and indexes included in each type of session log include: the session management method comprises the following steps of a source IP, a destination IP, a source port, a destination port, session duration, session uplink data length, session downlink data length and session connection state, wherein if an HTTP type session log exists, the session management method also comprises indexes: request URL, request parameters, domain name, request mode, etc.
An index with more obvious specificity is extracted from the indexes and serves as a feature index corresponding to the session log, and the feature index selected in the embodiment includes: the type of the session log, a source port, a destination port, session duration, session uplink data length, session downlink data length and session connection state, if the session log of the HTTP type exists, the session log further comprises indexes: request URL, request parameters, domain name, request mode, etc.
Step S3: and dividing the conversation logs into one or more log groups according to log types, and clustering the conversation logs in the log groups according to the characteristic indexes to obtain one or more characteristic classes.
In this embodiment, the session logs obtained in step S2 are first grouped according to log types, so as to obtain one or more log groups. And then, performing density-based clustering on the session logs in each log group according to the characteristic indexes, for example, classifying the session logs containing similar characteristic indexes into the same characteristic class by using a Dbscan algorithm. The advantage of using density clustering in the steps is that the number of clusters does not need to be specified before clustering, since in practice it is not known exactly how many feature classes will be.
Step S4: and extracting corresponding representative features from each feature class, and generating a malware feature vector according to the representative features of the log group, wherein the log group comprises one or more representative features.
Firstly, calculating the clustering center of each feature class, wherein the specific calculation method is a standard method of a clustering algorithm, namely: and sequentially selecting points in each cluster according to the sequence, calculating the sum of distances from the point to all the points in the current cluster, and regarding the point with the minimum sum of the distances as a central point. The center point is used as a representative feature for each feature class,
that is to say, according to the above method, each feature class uses the conversation log at the central point obtained by calculation as the representative feature of the corresponding feature class, each log group includes one or more such representative features, the representative features in each log group are merged to obtain the representative feature set corresponding to each log group, that is, corresponding to each log type, and all the representative feature sets are combined into the malware feature vector according to a prescribed order.
Specifically, in this embodiment, all the representative feature sets are arranged in the order of the log types TCP, UDP, DNS, DHCP, and HTTP to form a malware feature vector. If there is no representative feature set of a certain log type, the general method is to fill with 0 or a fixed random value, so as to ensure that the dimensions of the representative feature sets are consistent.
For example, in some embodiments, the ordering is by 3 types of session logs: TCP, UDP and HTTP, TCP type session logs representing feature sets [5535, 80, 1.63, 16456, 345], HTTP type session logs representing feature sets [60035, 8080, 4.63, 4432, 10333, 0.4532], but where UDP type session logs do not exist, they are all replaced with a random number that is almost impossible to repeat, such as 13467834, that is, the final malware feature vector is [5535, 80, 1.63, 16456, 103345 ], [13467834,13467834,13467834,13467834], [60035, 8080, 4.63, 4432, 10333, 0.4532 ]. It should be noted that this example is only used to illustrate the rules of random data population, and does not represent the actual data of the malware feature vector, nor does it limit the actual form of the malware feature vector.
In other embodiments, in order to improve the quality of the representative feature set and eliminate unnecessary feature classes, the feature classes obtained in step S3 are sorted according to the number of session logs contained therein. The reason for this is that the feature classes obtained by clustering may be many, but some feature classes containing a large number of session logs that are similar are obviously not suitable for being used as the screening features, and only the feature classes containing a small number of session logs and having a large difference from other session logs are more suitable for being used as the screening features. Therefore, feature classes that contain a small number of session logs are more valuable.
Then, the session logs included in the feature classes obtained in step S3 are grouped according to the corresponding terminals, and the number of non-repeating groups included in each feature class, that is, the number of non-repeating terminals included in each feature class is calculated. And selecting the characteristic class capable of covering more terminals according to the number of the terminals covered by the characteristic class. This has the advantage that, for example, a session log contained in a feature class appears in many different terminals, and a larger number of occurrences means that more terminals generate the session log, and the value of the session log is higher. Specifically, for example, if there are 6 terminals in total, and the session logs included in feature class a appear in 2 terminals, and the session logs included in feature class B appear in 5 terminals, it is obvious that feature class B is more valuable.
And finally, comprehensively considering the same feature class in the ranking of the two modes to select the high-value feature class. The specific mode can be that the two modes are ranked according to the value from high to low, and the corresponding average ranking is calculated according to the two ranks of each feature class, so that the value of the feature classes is comprehensively considered, and the extracted feature classes are more reliable. And then extracting the feature class with the highest fixed proportion value in the total number of the feature classes as a high-value feature class according to the average ranking corresponding to the feature classes, extracting high-value representative features from the high-value class, combining the high-value representative features in each log group to obtain a high-value representative feature set corresponding to each log group, namely each log type, and forming the malware feature vectors by all the high-value representative feature sets according to a specified sequence.
In this step, the feature vector of the malware obtained at present is a single feature only aiming at the feature index in the session log itself, and it may not be comprehensive enough to measure the propagation behavior of the malware by using only the feature of this aspect, so the time dimension statistical feature associated with the session log can also be added. In this embodiment, the number of session logs generated within a fixed time, the number of destination terminal connections within a fixed time, and all session log time interval sequences are added as time dimension statistical features to the malware feature vector to measure the propagation behavior of the malware. The method includes that a terminal generates a plurality of session logs, each session log necessarily comprises a source terminal and a destination terminal, the source terminal is the terminal, the destination terminal is a receiver corresponding to the source terminal, and the connection number of the destination terminals is the number of the receivers corresponding to the session logs generated by the terminal.
Example two
The method for extracting the malicious software propagation characteristics is applied to software detection, and the embodiment provides a malicious software detection method, which comprises the following steps:
acquiring the malicious software feature vector and a target feature vector of target software;
and calculating the similarity of the target feature vector and the malicious software feature vector, and if the similarity exceeds a set threshold, judging that the target software is malicious software.
The extraction method of the malware propagation characteristics aims at a malware characteristic vector extracted by malware. A greater number of malware may be collected as samples of feature extraction to build a malware feature library. Specifically, the malware may be acquired by some known antivirus software, threat intelligence manufacturers, websites and the like, and in this embodiment, the acquired malware is required to be classified in advance according to types, and the number of each malware type is required to be as balanced as possible. And then extracting the malware characteristic vector corresponding to each piece of malware according to the extraction method of the malware propagation characteristics to form a malware characteristic library.
After the malware feature library is constructed, a detection device with the malware feature library can be operated on a server, according to the malware detection method of the embodiment, one or more terminals needing malware detection are accessed to the server, network traffic of the terminals is collected, target feature vectors are extracted according to the extraction method of the malware propagation features, the similarity between the target feature vectors and the malware feature vectors in the malware feature library is calculated, the malware feature vectors with the maximum similarity are selected as comparison results, when the similarity exceeds a certain threshold, the target software is considered to be in accordance with the propagation features of certain malware, and the corresponding terminals are traced back, so that the terminals infected by the malware can be located, and specific infected malware types can be confirmed according to the propagation features of the malware. The advantage of doing so is that no longer need install any monitoring acquisition program on the terminal, do not occupy any terminal performance.
EXAMPLE III
Based on the same concept, the present embodiment further provides an extraction device for malware propagation characteristics, and referring to fig. 2, fig. 2 is a structural block diagram of an extraction device for malware propagation characteristics according to an embodiment of the present application. The device realizes the extraction method of the malicious software propagation characteristics, and comprises the following steps:
the acquisition module is used for acquiring network flow of the malicious software on the terminal;
the analysis module is used for analyzing the network flow into a plurality of session logs, wherein the session logs comprise a plurality of characteristic indexes;
the clustering module is used for dividing the conversation logs into one or more log groups according to log types and clustering the conversation logs in the log groups according to the characteristic indexes to obtain one or more characteristic classes;
and the characteristic extraction module is used for combining the representative characteristics of different log groups to generate a malware characteristic vector, wherein the log groups comprise one or more representative characteristics.
Example four
The present embodiment further provides an electronic apparatus, specifically referring to fig. 3, including a memory 304 and a processor 302, where the memory 304 stores a computer program, and the processor 302 is configured to run the computer program to perform the steps of the method for extracting a malware propagation characteristic in the first embodiment or the steps of the method for detecting malware in the second embodiment.
Specifically, the processor 302 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory 304 may be used to store or cache various initialization data files that need to be processed and/or used for communication, as well as possibly computer program instructions executed by the processor 302.
The processor 302 reads and executes the computer program instructions stored in the memory 304 to implement any one of the malware propagation feature extraction method in the first embodiment or the malware detection method in the second embodiment.
Optionally, the electronic apparatus may further include a transmission device 306 and an input/output device 308, where the transmission device 306 is connected to the processor 302, and the input/output device 308 is connected to the processor 302.
The transmitting device 306 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 306 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input/output device 308 is used to input or output information. For example, the input/output device may be a display screen, a mouse, a keyboard, or other devices. In this embodiment, the input device is used to input the acquired information, the input information may be data, tables, images, real-time videos, and the output information may be texts, charts, alarm information, etc. displayed by the service system.
Alternatively, in this embodiment, the processor 302 may be configured to execute the following steps by a computer program:
acquiring network flow of malicious software on a terminal;
parsing the network traffic into a plurality of session logs, wherein the session logs include a plurality of characteristic metrics;
dividing the session logs into one or more log groups according to log types, and clustering the session logs in the log groups according to the characteristic indexes to obtain one or more characteristic classes;
and extracting corresponding representative features from each feature class, and combining the representative features of different log groups to generate a malware feature vector, wherein the log groups comprise one or more representative features.
In other embodiments, the processor 302 may be further configured to execute the following steps by a computer program:
acquiring the malicious software feature vector and a target feature vector of target software;
and calculating the similarity of the target feature vector and the malicious software feature vector, and if the similarity exceeds a set threshold, judging that the target software is malicious software.
In addition, with reference to any one of the extraction method of the malware propagation characteristics in the first embodiment or the malware detection method in the second embodiment, the embodiment of the present application may be implemented by a computer program product. The computer program product includes: a program or instructions, which when run on a computer, causes the computer to execute the extraction method for implementing any one of the malware propagation characteristics in the first embodiment or the malware detection method in the second embodiment.
In addition, in combination with any one of the extraction method of the malware propagation characteristics in the first embodiment or the malware detection method in the second embodiment, the embodiment of the present application may provide a readable storage medium to implement the method. The readable storage medium having stored thereon a computer program; when being executed by a processor, the computer program implements any one of the extraction method of the malware propagation characteristics in the first embodiment or the malware detection method in the second embodiment.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also called program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.
Claims (5)
1. The extraction method of the malicious software propagation characteristics is characterized by comprising the following steps:
establishing a virtualization network by a virtualization means, wherein the virtualization network comprises one or more virtual terminals which are provided with a current common operating system, a DHCP server and a DNS server, accessing the equipment into a unified virtual network, setting network mirror image service, guiding network traffic of all equipment in the virtual network to a traffic collection server, and running a single malicious software on any virtual terminal in the virtualization network to acquire the network traffic of the malicious software on the terminal;
parsing the network traffic into a plurality of session logs, wherein the session logs include a plurality of characteristic metrics;
dividing the conversation logs into one or more log groups according to log types, and clustering the conversation logs in the log groups based on density according to the characteristic indexes to obtain one or more characteristic classes; extracting one or more high-value feature classes according to the number of session logs contained in each feature class and the number of terminals contained in each feature class;
extracting high-value representative features from the high-value feature classes, combining the high-value representative features in each log group to obtain a high-value representative feature set corresponding to each log group, forming a malware feature vector by all the high-value representative feature sets according to a specified sequence, and adding time dimension statistical features associated with the session logs into the malware feature vector, wherein the log groups comprise one or more high-value representative features.
2. The malicious software detection method is used for detecting software and is characterized by comprising the following steps:
acquiring a malware characteristic vector and a target characteristic vector of target software according to claim 1;
and calculating the similarity of the target feature vector and the malicious software feature vector, and if the similarity exceeds a set threshold, judging that the target software is malicious software.
3. The extraction device of the malicious software propagation characteristics is characterized by comprising:
the acquisition module is used for establishing a virtualized network by a virtualization means, wherein the virtualized network comprises one or more virtual terminals which are provided with a current common operating system, a DHCP server and a DNS server, the equipment is accessed into a unified virtual network, network mirror image service is set, network traffic of all equipment in the virtual network is guided to a traffic acquisition server, a single malicious software is operated on any virtual terminal in the virtualized network, and the network traffic of the malicious software on the terminal is acquired;
the analysis module is used for analyzing the network flow into a plurality of session logs, wherein the session logs comprise a plurality of characteristic indexes;
the clustering module is used for dividing the session logs into one or more log groups according to log types, clustering the session logs in the log groups based on density according to the feature indexes to obtain one or more feature classes, and extracting one or more high-value feature classes according to the number of the session logs contained in each feature class and the number of terminals contained in each feature class;
the feature extraction module is used for extracting high-value representative features from the high-value feature classes, combining the high-value representative features in each log group to obtain a high-value representative feature set corresponding to each log group, forming a malware feature vector by all the high-value representative feature sets according to a specified sequence, and adding time dimension statistical features associated with the session logs into the malware feature vector, wherein the log groups comprise one or more high-value representative features.
4. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the malware propagation feature extraction method of claim 1 or the malware detection method of claim 2.
5. A readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process, the process comprising the malware propagation feature extraction method of claim 1 or the malware detection method of claim 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110870400.8A CN113553589B (en) | 2021-07-30 | 2021-07-30 | Extraction method, device and application of malicious software propagation characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110870400.8A CN113553589B (en) | 2021-07-30 | 2021-07-30 | Extraction method, device and application of malicious software propagation characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113553589A CN113553589A (en) | 2021-10-26 |
CN113553589B true CN113553589B (en) | 2022-09-02 |
Family
ID=78105013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110870400.8A Active CN113553589B (en) | 2021-07-30 | 2021-07-30 | Extraction method, device and application of malicious software propagation characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113553589B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738329A (en) * | 2023-05-15 | 2023-09-12 | 国家计算机网络与信息安全管理中心 | A malicious sample classification method, device, electronic device and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649344A (en) * | 2015-10-31 | 2017-05-10 | 华为数字技术(苏州)有限公司 | Network log compression method and apparatus |
CN107222511A (en) * | 2017-07-25 | 2017-09-29 | 深信服科技股份有限公司 | Detection method and device, computer installation and the readable storage medium storing program for executing of Malware |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107733937A (en) * | 2017-12-01 | 2018-02-23 | 广东奥飞数据科技股份有限公司 | A kind of Abnormal network traffic detection method |
CN111159413A (en) * | 2019-12-31 | 2020-05-15 | 深信服科技股份有限公司 | Log clustering method, device, equipment and storage medium |
CN111447232A (en) * | 2020-03-30 | 2020-07-24 | 杭州迪普科技股份有限公司 | Network flow detection method and device |
CN111797997A (en) * | 2020-07-08 | 2020-10-20 | 北京天融信网络安全技术有限公司 | Network intrusion detection method, model building method, device and electronic device |
-
2021
- 2021-07-30 CN CN202110870400.8A patent/CN113553589B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649344A (en) * | 2015-10-31 | 2017-05-10 | 华为数字技术(苏州)有限公司 | Network log compression method and apparatus |
CN107222511A (en) * | 2017-07-25 | 2017-09-29 | 深信服科技股份有限公司 | Detection method and device, computer installation and the readable storage medium storing program for executing of Malware |
Also Published As
Publication number | Publication date |
---|---|
CN113553589A (en) | 2021-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10104101B1 (en) | Method and apparatus for intelligent aggregation of threat behavior for the detection of malware | |
US9912691B2 (en) | Fuzzy hash of behavioral results | |
EP3905622A1 (en) | Botnet detection method and system, and storage medium | |
CN106209759B (en) | Detect suspicious files residing on the network | |
CN111355697B (en) | Detection method, device, equipment and storage medium for botnet domain name family | |
CN109194680B (en) | Network attack identification method, device and equipment | |
US20180302430A1 (en) | SYSTEM AND METHOD FOR DETECTING CREATION OF MALICIOUS new USER ACCOUNTS BY AN ATTACKER | |
CN110365674B (en) | Method, server and system for predicting network attack surface | |
JP6174520B2 (en) | Malignant communication pattern detection device, malignant communication pattern detection method, and malignant communication pattern detection program | |
CN111371778B (en) | Attack group identification method, device, computing equipment and medium | |
CN107454040B (en) | Application login method and device | |
US9871810B1 (en) | Using tunable metrics for iterative discovery of groups of alert types identifying complex multipart attacks with different properties | |
WO2017140710A1 (en) | Detection of malware in communications | |
US9641595B2 (en) | System management apparatus, system management method, and storage medium | |
CN113553589B (en) | Extraction method, device and application of malicious software propagation characteristics | |
US10963562B2 (en) | Malicious event detection device, malicious event detection method, and malicious event detection program | |
He et al. | On‐Device Detection of Repackaged Android Malware via Traffic Clustering | |
CN113691483A (en) | Method, device and equipment for detecting abnormal user equipment and storage medium | |
Jain et al. | Towards mining latent client identifiers from network traffic | |
CN110392032B (en) | Method, device and storage medium for detecting abnormal URL | |
CN113098852B (en) | Log processing method and device | |
US11159548B2 (en) | Analysis method, analysis device, and analysis program | |
CN111148185A (en) | Method and device for establishing user relationship | |
CN115935356A (en) | Software security testing method, system and application | |
CN106919836B (en) | Application port detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Method, device, and application for extracting propagation characteristics of malicious software Granted publication date: 20220902 Pledgee: Industrial and Commercial Bank of China Limited Nanjing Science and technology sub branch Pledgor: JIANGSU YIANLIAN NETWORK TECHNOLOGY Co.,Ltd. Registration number: Y2024980036804 |