CN112148305B - Application detection method, device, computer equipment and readable storage medium - Google Patents
Application detection method, device, computer equipment and readable storage medium Download PDFInfo
- Publication number
- CN112148305B CN112148305B CN202011171532.3A CN202011171532A CN112148305B CN 112148305 B CN112148305 B CN 112148305B CN 202011171532 A CN202011171532 A CN 202011171532A CN 112148305 B CN112148305 B CN 112148305B
- Authority
- CN
- China
- Prior art keywords
- target
- code block
- code
- hash value
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 102
- 238000003860 storage Methods 0.000 title abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000004044 response Effects 0.000 claims abstract description 14
- 238000005520 cutting process Methods 0.000 claims description 54
- 239000012634 fragment Substances 0.000 claims description 44
- 238000004422 calculation algorithm Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 17
- 238000007621 cluster analysis Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 4
- 230000018109 developmental process Effects 0.000 claims 9
- 230000011218 segmentation Effects 0.000 claims 2
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 6
- 238000000638 solvent extraction Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/53—Decompilation; Disassembly
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Prevention of errors by analysis, debugging or testing of software
- G06F11/3668—Testing of software
- G06F11/3672—Test management
- G06F11/3684—Test management for test design, e.g. generating new test cases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Prevention of errors by analysis, debugging or testing of software
- G06F11/3668—Testing of software
- G06F11/3672—Test management
- G06F11/3688—Test management for test execution, e.g. scheduling of test suites
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Stored Programmes (AREA)
Abstract
The embodiment of the invention provides an application detection method, an application detection device, computer equipment and a readable storage medium, wherein the method comprises the following steps: dividing a target code block from program codes of the target application in response to a code detection operation for the target application; calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit; determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block; and adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block, so that the SDK included in the application can be detected rapidly, and the efficiency of application detection is improved.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an application detection method, an application detection device, computer equipment and a readable storage medium.
Background
Currently, in order to improve development efficiency and reduce cost, application developers commonly use a third party software development kit (Software Development Kit, SDK). Generally, an Application (APP) that is formally on-shelf will integrate about 20 or more third party SDKs. While the third-party SDKs are widely used, related security problems are increasingly frequent, such as security holes of the SDKs, private collection of user privacy data, malicious operations performed by some malicious SDKs by using the APP, and different security risks exist in different types of SDKs, so that detection of which types of third-party SDKs are integrated in the APP is required. At present, in the third party SDK detection, some features in an SDK code packet are extracted to identify a certain SDK mainly based on feature matching, then in the APP detection, if the features are found, the SDK is integrated in the APP, and the mode mainly depends on the SDK code packet, so that the detection efficiency is low.
Disclosure of Invention
The embodiment of the invention provides an application detection method, an application detection device, computer equipment and a readable storage medium, which can rapidly detect SDKs included in applications and improve the application detection efficiency.
In one aspect, an embodiment of the present invention provides an application detection method, where the method includes:
in response to a code detection operation for the target application, partitioning a target code block from program code of the target application;
Calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
Determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
And adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
In another aspect, an embodiment of the present application provides an application detection apparatus, including:
a dividing unit configured to divide an object code block from a program code of the object application in response to a code detection operation for the object application;
a calculating unit, configured to calculate a code similarity between the target code block and at least one reference code block, where one reference code block corresponds to one software development kit;
A determining unit configured to determine a matching code block that matches the target code block from the at least one reference code block according to a code similarity between the target code block and each reference code block;
And the adding unit is used for adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
In yet another aspect, an embodiment of the present application provides a computer device, where the computer device includes an input device and an output device, and the computer device further includes:
a processor adapted to implement one or more instructions; and
A computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
in response to a code detection operation for the target application, partitioning a target code block from program code of the target application;
Calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
Determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
And adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
In yet another aspect, embodiments of the present application provide a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the steps of:
in response to a code detection operation for the target application, partitioning a target code block from program code of the target application;
Calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
Determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
And adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
In the embodiment of the invention, the computer equipment responds to the code detection operation aiming at the target application, cuts out target code blocks from the program codes of the target application, calculates the code similarity between the target code blocks and at least one reference code block, and then determines a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code blocks and each reference code block; and adding target marking information for the target code block according to the matched code block. The computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet (short for source code packet) or a jar packet (short for Java compressed code packet) of the SDK, and can quickly detect the integrated SDK in the target application by only cutting the program code of the target application and analyzing the similarity between the code block obtained by cutting and the reference code block, thereby effectively improving the efficiency of application detection.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application detection system according to an embodiment of the present invention;
FIG. 2a is a schematic flow chart of an application detection scheme according to an embodiment of the present invention;
FIG. 2b is a schematic diagram of a computer device according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of an application detection method according to an embodiment of the present invention;
FIG. 4a is a schematic diagram of a hierarchical structure style of program code according to an embodiment of the present invention;
FIG. 4b is a schematic diagram of a code provided by an embodiment of the present invention;
FIG. 5 is a flowchart of another application detection method according to an embodiment of the present invention;
FIG. 6a is a schematic diagram of a hash value of a calculation operation instruction according to an embodiment of the present invention;
FIG. 6b is a schematic diagram of calculating a target hash value and a reference hash value according to an embodiment of the present invention;
FIG. 7 is a flowchart of a specific application detection method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an application detection device according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to be able to better detect which software development kits are integrated in an application, the software development kits mentioned here may be: the third party logs in the SDK of the sharing class, the SDK of the payment class, the SDK of the pushing class, the SDK of the advertisement class, the SDK of the data statistics class, and the like. The embodiment of the application provides an application detection scheme; the execution subject of the application detection scheme may be a computer device, which may be a terminal device (hereinafter referred to as a terminal) or a server. When the computer equipment is a server, the embodiment of the application also provides an application detection system shown in the figure 1; the application detection system may comprise at least one terminal 101 and a server (i.e. computer device) 102. In the application detection system, the terminal 101 and the server 102 may be directly or indirectly connected through a wired or wireless communication manner, and the embodiment of the present application is not limited herein. It should be noted that, the above-mentioned terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc.; the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content distribution network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform, and the like.
In practical applications, when a developer or other user wants to perform application detection on a target application, the developer or other user can send a code detection operation for the target application to the computer device through the application detection interface. In one embodiment, an application identification input box may be included within the application detection interface; in this embodiment, the code detection operation may be an operation of inputting an application identification of the target application in the application detection box. In yet another embodiment, the application detection interface may include at least an application icon of the target application; in this embodiment, the code detection operation may be a trigger operation of an application icon for the target application, such as a click operation, a press operation, or the like. Accordingly, the computer device can respond to the code detection operation aiming at the target application, and the application detection scheme provided by the embodiment of the application is adopted to detect which software tool packages are integrated in the target application. Referring to fig. 2a, the general principle of the application detection scheme is as follows: the computer device may first cut code blocks of the program code of the target application to obtain at least one code block. For any code block obtained by cutting, carrying out similarity analysis on the any code block and at least one reference code block; and then, marking the SDK code block of any code block according to the analysis result and the SDK corresponding to each reference code block so as to mark the SDK to which any code block belongs. Based on the marking principle, the SDK code block marking can be carried out on each code block obtained by cutting. Since each code block is cut from the program code of the target application, i.e., each target code block is part of the program code of the target application; therefore, by marking the SDK code blocks for the respective code blocks, the SDK detection result of the target application, which indicates which SDKs are integrated in the target application, can be output.
In a possible embodiment, in order to better implement the relevant steps of the application detection scheme, the following modules may be deployed in a computer device: the detection control module, the code block cutting module and the code block marking module are as shown in fig. 2 b. In a specific implementation, the detection control module is mainly used for: and calling a code block cutting module and a code block marking module to detect the condition of integrating the SDK in the target application, and outputting an SDK detection result for indicating which SDKs are integrated in the target application. Specifically, the detection control module may first call the code block cutting module to cut the code block of the program code of the target application and obtain the feature parameters of each code block obtained by cutting. For any code block obtained by cutting, the detection control module can call the code block marking module to calculate the similarity between the characteristic parameters of the any code block and the characteristic parameters of at least one reference code block, and mark the code block of any code block according to the similarity calculation result and the SDK corresponding to each reference code block. Optionally, before detecting the target application, the detection control module may further call a code block marking module to perform cluster analysis on at least one reference code block, and perform SDK code block marking on each reference code block according to a cluster analysis result by using a marking policy, so as to determine an SDK corresponding to each reference code block.
Therefore, the application detection scheme provided by the embodiment of the application has the following beneficial effects: through the implementation flow of the application detection, the computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet (short for source code packet) or a jar packet (short for Java compressed code packet) of the SDK, and can rapidly detect the integrated SDK in the target application by only cutting the program code of the target application and analyzing the similarity between the code block obtained by cutting and the reference code block, thereby effectively improving the efficiency of application detection.
Referring to fig. 3, fig. 3 is a flow chart of an application detection method according to an embodiment of the invention. The application detection method may be performed by the above-mentioned computer device, and may include the following steps S301 to S304:
S301, in response to a code detection operation for a target application, dividing a target code block from program codes of the target application.
The target application may be a social application, a multimedia playing application, a browser application, or the like. In a specific implementation, when the user wants to detect which software development kit SDKs are integrated in the target application, a code detection operation for the target application may be entered. Accordingly, the computer device may partition the target code blocks from the program code of the target application in response to the code detection operation for the target application.
Specifically, the computer equipment firstly acquires a code file of a target application, and decompiles the code file to obtain a hierarchical structure of a program code; the code file may include, but is not limited to: android application packages (Android application package, APK), IOS application packages, WP application packages, windows application packages, and so forth; for convenience of explanation, the code file will be hereinafter described as APK. The hierarchy may include at least one hierarchy, each hierarchy may include at least one code packet identification therein; and, for each code packet identification in any level except the top level, a corresponding parent code packet identification can be found in the level above any level. The structural style of the hierarchical structure may be a tree-type structural style or a structural style shown in fig. 4 a. After the hierarchical structure is obtained, code cutting can be carried out on the program code of the target application according to the hierarchical structure to obtain at least one code block; then, an object code block may be obtained from the at least one code block.
In the process of performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block, in order to cut the code of the same SDK into the same code block as much as possible, the computer equipment may adopt a cutting strategy proposed according to the practical result to perform code cutting on the program code of the target application. The code package identification meeting the cutting list is searched in the hierarchical structure according to the sequence from bottom to top, and the code is cut in a mode of cutting one layer downwards based on the searched code package identification. Correspondingly, the specific implementation manner of performing code cutting on the program code of the target application according to the hierarchical structure to obtain at least one code block may be:
A preset cutting list is obtained, wherein the cutting list comprises code packet identifiers to be cut. After obtaining the hierarchical structure of the program code, determining a starting hierarchy to be traversed from the hierarchical structure, and traversing each code packet identifier in the starting hierarchy; if the currently traversed object code packet identifier is in a preset cutting list, the codes corresponding to the code packet identifiers in the hierarchy of the next layer of the initial hierarchy are cut into different code blocks. If the currently traversed object code packet identifier is not in the cut list, judging whether a first parent code packet identifier corresponding to the object code packet identifier in a first hierarchy is positioned in the cut list, wherein the first hierarchy is a hierarchy positioned on the upper layer of the initial hierarchy. If the first parent code packet identifier is located in the cutting list, cutting a code corresponding to the target code packet identifier into a code block; if the first parent code packet identifier is not located in the cut list, judging whether a second parent code packet identifier corresponding to the first parent code packet identifier in a second level is located in the cut list, wherein the second level is a level located on the upper layer of the first level. If the second father code packet identifier is located in the cutting list, respectively cutting codes corresponding to the code packet identifiers in the first level into different code blocks; if the second parent code packet identifier is not located in the cut list, judging whether a third parent code packet identifier corresponding to the second parent code packet identifier in a third level is located in the cut list, wherein the third level refers to a level located at the upper layer of the second level, and the like.
In a specific implementation, fig. 4a exemplarily shows a hierarchical structure including 3 levels obtained by decompiling a code file of a target application, and each level includes at least one code packet identifier, where each code packet identifier may refer to an english identifier shown in fig. 4 a. When code cutting is performed on the program code of the target application according to the hierarchical structure, the computer equipment can determine that the initial hierarchy to be traversed is a 3 rd hierarchy from the 3 rd hierarchy, and traverse each code package identifier in the 3 rd hierarchy; if the currently traversed object code packet identifier is soloade and the object code packet identifier soloade is not in the cut list, judging whether a first parent code packet identifier faebook corresponding to the object code packet identifier soloader in the 2 nd level (i.e., the first level located above the initial level) is located in the cut list, and if the first parent code packet identifier faebook is located in the cut list, cutting the code corresponding to the object code packet identifier soloader into a code block; if the first parent code packet identifier faebook is not located in the cut list, determining whether a second parent code packet identifier com corresponding to the first parent code packet identifier faebook in the 1 st level (i.e., in a second level located above the first level) is located in the cut list; if the second parent code packet identifier com is located in the cutting list, the codes corresponding to the code packet identifiers in the second hierarchy are respectively cut into different code blocks, that is, the codes corresponding to the code packet identifier faebook are cut into one code block, the codes corresponding to the code packet identifier donking are cut into one code block, the codes corresponding to the code packet identifier aplipay are cut into one code block, and so on.
S302, calculating the code similarity between the target code block and at least one reference code block.
In a specific implementation, the computer device may first obtain at least one reference code block, where one reference code block corresponds to one software development kit; specifically, the computer device may obtain at least one reference code block from a local storage space of the computer device or from a software development kit feature library. Then, a code similarity between the target code block and at least one reference code block is calculated. In one embodiment, the computer device may compare the program code of the target code block with each reference code block, determine a repetition of the target code block and each reference code block, and calculate a code similarity between the target code block and at least one reference code block according to the repetition.
In yet another embodiment, since in the current development ecology, the third party software development kit provider typically requires the application developer to confuse the code of the SDK when integrating the SDK, resulting in different code content of the same SDK in different APPs. The code confusion can be used for program source codes and intermediate codes compiled by the program; which refers to the act of converting the code of a program into a functionally equivalent, but difficult to read and understand form. In order to overcome the influence of code confusion on the calculation of the similarity of the code blocks, the computer equipment can acquire the characteristic parameters of the target code block and the characteristic parameters of each reference code block to perform the similarity calculation when calculating the code similarity between the target code block and at least one reference code block, so as to obtain the code similarity between the target code block and at least one reference code block.
The characteristic parameters refer to parameters which cannot be changed due to code confusion. It has been shown that in the code obfuscation technique, in order to reduce the readability of the program code, the meaningful class names, function names, variable names and other parameters in the program code are replaced by names that are difficult to read and nonsensical, but in order to ensure the consistency of the code functions, the code obfuscation generally does not process the operation codes or the portions of the operation instructions in the program code. The code shown in fig. 4b includes variable names such as v3, v6 and the like; these variable names are typically replaced with other names when the codes are confused. Thus, the computer device may extract the operating instructions in the target code block as characteristic parameters of the target code block to effectively obfuscate the code scheme for most of the pits. Also, since code obfuscation does not generally affect the hierarchy of code blocks, the computer device may also obtain the hierarchy of target code blocks as characteristic parameters of the target code blocks.
S303, determining a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block.
In one specific implementation, the computer device selects a reference code block with the highest code similarity with the target code block from at least one reference code block as a matching code block matched with the target code block according to the code similarity between the target code block and each reference code block. That is, in this particular implementation, the matching code block refers to the reference code block that has the greatest similarity to the code of the target code block.
In yet another specific implementation, the computer device may determine whether a code similarity between the target code block and each reference code block is greater than a threshold; wherein the threshold may be set empirically and on demand. If the code similarity greater than the threshold exists, the target code block corresponding to the existing code similarity is considered to be similar to the reference code block, and the reference code block corresponding to the code similarity greater than the threshold can be determined as the matched code block matched with the target code block. That is, in this particular implementation, a matching code block refers to a reference code block having a code similarity to the target code block that is greater than a threshold.
In yet another specific implementation, the computer device may determine whether a code similarity between the target code block and each reference code block is greater than a threshold; if the code similarity is larger than the threshold value, determining the maximum code similarity from the existing code similarities, and determining the reference code block corresponding to the maximum code similarity as the matched code block matched with the target code block. That is, in this particular implementation, a matching code block refers to a reference code block that has a code similarity to the target code block that is greater than a threshold value and that has the greatest code similarity.
S304, adding target mark information for the target code block according to the matched code block.
Research shows that the software development kits corresponding to two similar code blocks are generally the same; therefore, after determining the matching code block (i.e. the code block similar to the target code block) matching the target code block, the software development kit corresponding to the matching code block can be directly determined as the software development kit corresponding to the target code block, so that the target mark information is added to the target code block, and the target mark information is used for indicating that the target code block belongs to the software development kit corresponding to the matching code block. The target mark information can comprise a type identifier of a software development kit corresponding to the matched code block; by adding the target mark information to the target code block, the method can be convenient for determining that the software development kit corresponding to the matched code block is integrated in the target application directly according to the target mark information when the safety detection is carried out on the target application, and detecting the target application in a targeted manner by adopting the safety detection strategy corresponding to the software development kit, so that the detection efficiency and accuracy are improved.
In the embodiment of the invention, the computer equipment can respond to the code detection operation aiming at the target application, cut out the target code block from the program code of the target application, calculate the code similarity between the target code block and at least one reference code block, further, according to the code similarity between the target code block and each reference code block, determine the matched code block matched with the target code block from at least one reference code block, and add the target mark information for the target code block according to the matched code block.
Referring to fig. 5, fig. 5 is a flowchart of another application detection method according to an embodiment of the present invention, where the application detection method may be executed by the computer device. In the embodiment of the invention, the reference code blocks are mainly taken as an example to be obtained from a software development kit feature library for explanation; the computer device may pre-build a software development kit feature library before performing the application detection method proposed by an embodiment of the present invention. The specific construction process is as follows:
Program code of at least two reference applications may be acquired first and partitioned into a plurality of reference code blocks. Secondly, the computer equipment can perform cluster analysis on the plurality of reference code blocks and perform grouping processing on the plurality of reference code blocks according to analysis results; the reference code blocks in a group belong to the same software development kit. Specifically, the computer device may calculate the similarity between any two reference code blocks, and determine that any two reference code blocks having a similarity greater than a threshold belong to the same class, thereby clustering the two reference code blocks. Or the computer equipment can also call the clustering model to perform clustering processing on the plurality of reference code blocks so as to perform grouping processing on the plurality of reference code blocks; the clustering model is obtained by training a neural network model by adopting a large amount of sample data in advance. After the grouping processing, the computer device may obtain the marking information of each reference code block according to the group information of each reference code block, where the marking information of any reference code block is generated after performing feature joint analysis on each reference code block in the group to which any reference code block belongs to determine a software development kit to which any reference code block belongs. Each reference code block and corresponding marker information may then be added to the software development kit feature library.
One specific embodiment of obtaining the marking information of each reference code block according to the group information of each reference code block may be: the group information of each reference code block can be pushed to a manager, so that the manager performs feature joint analysis on each reference code block in the group to which any reference code block belongs according to the group information to determine a software development kit to which any reference code block belongs; the corresponding tag information added by the reference code blocks in each group is then entered in the computer device. Accordingly, the computer device may obtain the marking information of each reference code block input by the manager. That is, in this embodiment, the manager manually performs feature joint analysis to determine the tag information of the reference code block. Or according to the group information of each reference code block, another specific embodiment of obtaining the marking information of each reference code block may be: and directly carrying out feature joint analysis on each reference code block in the group to which any reference code block belongs to determine a software development kit to which any reference code block belongs, and then acquiring the marking information of each reference code block according to the group information of each reference code block. In this embodiment, the feature combination analysis is automatically performed by the computer device, so that the marking information of the reference code blocks is determined.
After the software development kit feature library is constructed based on the above steps, after detecting the code detection operation for the target application, the computer device may perform detection processing by using the application detection method shown in fig. 5, and specifically, see the following description of steps S501-S507:
s501, in response to a code detection operation for a target application, a code file of the target application is acquired, wherein the code file comprises program codes of the target application.
S502, decompiling the code file to obtain a hierarchical structure of the program code.
And S503, performing code cutting on the program codes of the target application according to the hierarchical structure to obtain at least one code block.
S504, selecting an object code block from at least one code block.
The implementation manner of steps S501-S504 can be referred to the specific implementation manner of step S301 in fig. 3, and will not be described herein.
S505, calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit.
In a specific implementation, the code similarity between the target code block and each reference code block is calculated identically. Therefore, the following description of the embodiments of the present application will take as an example how to calculate the code similarity between the target code block and any reference code block. Specifically, step S505 may include the following steps S11-S12:
And s11, acquiring the characteristic parameters of the target code block and the characteristic parameters of any reference code block.
From the foregoing, the characteristic parameter may be an operation instruction or a hierarchy. When the characteristic parameter is an operation instruction, and the operation instruction includes a plurality of target instruction fragments, a specific implementation manner of obtaining the characteristic parameter of the target code block by the computer device may be: at least one class can be obtained from the target code block, each class comprises at least one method and at least one variable, instruction extraction processing is carried out on each method in each class, a plurality of candidate instruction fragments are obtained, and a plurality of target instruction fragments are selected from the plurality of candidate instruction fragments.
Specifically, the computer device may directly use each candidate instruction fragment of the plurality of candidate instruction fragments as a target instruction fragment, thereby obtaining a plurality of target instruction fragments. Or the computer device may first determine the instruction length of each candidate instruction segment, determine whether the instruction length of each candidate instruction segment is greater than a length threshold, and select, from the plurality of candidate instruction segments, a candidate instruction segment having an instruction length greater than the length threshold as the target instruction segment. By the method, candidate instruction fragments with shorter instruction length can be effectively filtered out, so that the subsequent similarity calculation efficiency is improved.
It should be noted that, the implementation manner of the operation instruction for obtaining any reference code block may refer to the implementation manner of the operation instruction for obtaining the target code block, which is not described herein.
And s12, calculating the feature similarity of the feature parameters of the target code block and the feature parameters of any reference code block to obtain the code similarity between the target code block and any reference code block. Step s12 may have different embodiments, depending on the characteristic parameters, as described in detail below:
the characteristic parameter (a) is an operation instruction, and the specific implementation of step s12 is as follows:
First, a target hash algorithm may be used to perform a hash operation on an operation instruction of a target code block to obtain a target hash value. The target hash algorithm can be a local sensitive hash algorithm for large-scale text similarity calculation; the locality sensitive hashing algorithm may map a high-dimensional feature vector into a low-dimensional feature vector such that it is determined whether code blocks are highly similar by the distance between the two vectors. The operation instruction of the target code block includes a plurality of target instruction fragments, and can be specifically seen in fig. 6 a: the computer device may first perform a hash operation on each of the plurality of target instruction fragments, respectively, to obtain a hash value set, where the hash value set includes hash values of each instruction fragment. Second, the weight value of each hash value may be determined based on the number of times each hash value occurs in the hash value set. Then, the weight value of each hash value may be used to perform weighted summation on each hash value to obtain the target hash value.
Wherein, according to the number of times that each hash value appears in the hash value set, one implementation way of determining the weight value of each hash value is: the number of times each hash value appears in the hash value set is directly used as the weight value of each hash value. For example, in one hash value set, the hash value "100101" appears 2 times in the hash value set, and then the weight value of the hash value "100101" is 2; the hash value "101011" appears 3 times in the hash value set, and the weight value of the hash value "101011" is 3. Or according to the number of times each hash value appears in the hash value set, another implementation way of determining the weight value of each hash value is: the computer device may determine, according to the number of times each hash value appears in the hash value set, the repetition degree of each hash value in the hash value set, and use a weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value. In a specific implementation, the relationship between the repetition degree and the weight value corresponding to each hash value is preset. Let the repetition degree be 0, the corresponding weight value be 1, the repetition degree be 20% the corresponding weight value be 2, etc. If there are 5 hash values in the hash value set, the hash value "100101" appears 1 time in the hash value set, and it can be determined that the repetition degree of the hash value "100101" is 20%, and then it can be determined that the weight value corresponding to the repetition degree is 2 according to the repetition degree of 20%.
Wherein each hash value includes at least one of: at least one first value (e.g., value "0") and at least one second value (e.g., value "1"); correspondingly, the specific implementation manner of weighting and summing the hash values by adopting the weight values of the hash values to obtain the target hash value may be as follows: and for any hash value, multiplying and weighting each bit of any hash value by adopting the weight value of any hash value according to a preset multiplication principle to obtain a weighted result of any hash value. Wherein, the preset multiplication principle is used for indicating: if the current bit of any hash value is the first numerical value, positively multiplying the current bit by adopting a weight value; and if the current bit of any hash value is the second numerical value, carrying out negative multiplication on the current bit by adopting a weight value. For example, any hash value is "100101", and the weight value of any hash value is 2; the first bit of the hash value has a value of 1, and the computer equipment performs positive multiplication on the weight value 2 and the value 1 to obtain a value 2 corresponding to the second value; the second bit of the hash value has a value of 0, and corresponding to the first value, the computer device negatively multiplies the weight value 2 by the value 0 to obtain a value-2, and similarly, weights each bit of any hash value 100101 to obtain a final weighted result, wherein the weighted result is 2-2-2 2-2 2.
Based on the weighting principle, a weighting result of each hash value can be obtained. Summing the weighted results of all the hash values to obtain candidate hash values; and determining a target hash value according to the candidate hash value. In one embodiment, the candidate hash value may be directly used as the target hash value; in still another embodiment, in order to facilitate subsequent calculation, the candidate hash value may be further subjected to a dimension reduction process to obtain the target hash value. The dimension reduction process means: for any bit in the candidate hash value, if any bit is greater than zero, updating any bit to the second value, and if any bit is less than or equal to 0, updating any bit to the first value.
For example, two hash values are set, the hash value a is "100101", and the hash value B is "101011"; the weight value of the hash value A is 2, the weight value of the hash value B is 3, the weighted result of the hash value A is 2-2-2 2-2 2, and the weighted result of the hash value B is 3-3 3-3 3 according to the weighting principle. And summing the weighted results of the two hash values, namely adding corresponding bits in the weighted results of the two hash values, "(2+3) ((-2) +(-3)) ((-2) +3) (2+ (-3)) ((-2) +3) (2+3)") to obtain a candidate hash value of' 5-5 1-1 < 5 >. Then, performing dimension reduction treatment on the candidate hash value of 5-5 1-1 5; since the first bit of the candidate hash value has a value of 5, which is greater than 0, the value of the first bit may be updated to 1. Since the second bit of the candidate hash value has a value of-5, which is less than 0, the value of the second bit may be updated to 0, and so on, to obtain the target hash value of "10101 1".
Secondly, carrying out hash operation on the operation instruction of any reference code block by adopting a target hash algorithm to obtain a reference hash value. It should be noted that, the implementation manner of performing hash operation on the operation instruction of the target code block by the computer device to obtain the reference hash value by using the target hash algorithm may refer to the implementation manner of performing hash operation on the operation instruction of the target code block by using the target hash algorithm to obtain the target hash value, which is not described herein.
Then, a distance operation may be performed on the target hash value and the reference hash value, so as to obtain a code similarity between the target code block and any one of the reference code blocks. When the distance calculation is performed on the target hash value and the reference hash value, the distance calculation is performed on the target hash value and the reference hash value by using a Hamming distance formula, a Hamming distance formula and a Euclidean distance formula. Taking the hamming distance formula as an example, referring to fig. 6b, a specific implementation manner of performing a distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks may be: and performing distance operation on the target hash value and the reference hash value by utilizing a Hamming distance formula to obtain a Hamming distance between the target code block and any one of the reference code blocks, and determining the code similarity between the target code block and any one of the reference code blocks according to the obtained Hamming distance so as to judge whether the target code block and any one of the reference code blocks are similar or not according to the similarity. In a specific implementation, the computer device may perform an exclusive-or calculation on the target hash value and the reference hash value, where in the exclusive-or calculation, only when two compared bits are different, the result is 1, and when the two compared bits are the same, the result is 0, the number of 1 obtained by exclusive-or calculation on the target hash value and the reference hash value is a hamming distance, and the hamming distance is a code similarity between the target code block and any one of the reference code blocks. For example, the target hash value "101001", the reference hash value "110101", the target hash value "101001" and the reference hash value "110101" are exclusive-or calculated by using the hamming distance algorithm, the number of "1" s is 3, and the code similarity between the target code block and any one of the reference code blocks is 3.
(II) the characteristic parameters are hierarchical structures, and the specific implementation of step s12 is as follows:
The computer device may compare the hierarchical structure of the target code block with the hierarchical structure of any reference code block from a plurality of dimensions, and determine a code similarity between the target code block and any reference code block according to the comparison result, so as to ensure that a reference code block that is most matched with the target code block may be determined from at least one reference code block according to the code similarity. The plurality of dimensions may be the number of levels corresponding to the hierarchy, the code packet identifier included in the hierarchy, and the like.
S506, determining a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block.
And S507, adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
The specific implementation manner of steps S506 to S507 can be referred to above in steps S302 to S303, and will not be described herein. Further, since the matching code block is obtained from the feature library of the software development kit and the matching code block is cut from the program code corresponding to a certain reference application, and the target code block is cut from the program code of the target application, and the target code block is similar to the reference code block, the embodiment of the application can also determine that different application programs use the same code block.
In the embodiment of the invention, the computer equipment can respond to the code detection operation aiming at the target application, segment out the target code block from the program code of the target application, acquire the characteristic parameters of the target code block and the characteristic parameters of any reference code block aiming at any reference code block, further calculate the similarity between the characteristic parameters of the target code block and the characteristic parameters of any reference code block to obtain the code similarity between the target code block and any reference code block, determine the matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block, and add the target mark information for the target code block according to the matching code block. By determining the SDK to which the target code block belongs, the application integrated with the SDK can be identified without extracting the characteristics of the SDK, so that the type of the SDK included in the application is detected rapidly, and the efficiency of application detection is improved effectively.
Based on the application detection method provided by the above, the embodiment of the application also provides a more specific application detection method as shown in fig. 7. In the embodiment of the application, the code file of the target application is mainly taken as an APK for example. The specific flow is as follows:
The APK of the target application (namely the android application package) can be acquired firstly; and performing dex decompilation on the APK to obtain a hierarchical structure of the program code of the target application, and performing code cutting on the program code according to the hierarchical structure of the program code, so as to obtain n code blocks (namely code block 1 and code block 2 … … and code block n) in fig. 7. Further, the computer device performs hash computation on the operation instruction of each of the n code blocks, and when performing hash computation on the operation instruction of each code block, the following 4 steps are required to be performed: (1) fetching a plurality of instruction fragments in a code block; (2) Performing hash calculation on each instruction fragment in the plurality of instruction fragments; (3) After carrying out hash calculation on each instruction segment, carrying out weighted combination on the hash values of a plurality of instruction segments, (4) carrying out weighted combination on the hash values of a plurality of instruction segments, and then carrying out dimension reduction operation on the combined result, thereby obtaining the hash value of the operation instruction of the code block.
The hash values corresponding to the N code blocks can be obtained in the above manner. After obtaining the hash values of the N code blocks, the computer device may perform similarity calculation on the hash value of each code block and the hash value of at least one reference code block in the feature library of the software development kit, and output a recognition result of the code block according to the result of the similarity calculation. The similarity of the code blocks is compared by calculating the hash value of the operation instruction of the code block and the hash value of the operation instruction of the reference code block, so that the SDKs integrated in the application can be recognized relatively accurately. The computer device may generate the software development kit feature library by cluster analysis before the hash value of each code block may be subjected to similarity calculation with the hash value of at least one reference code block in the software development kit feature library.
In the embodiment of the invention, the computer equipment can respond to the code detection operation aiming at the target application, segment out the target code block from the program code of the target application, acquire the characteristic parameters of the target code block and the characteristic parameters of any reference code block aiming at any reference code block, further calculate the similarity between the characteristic parameters of the target code block and the characteristic parameters of any reference code block to obtain the code similarity between the target code block and any reference code block, determine the matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code block and each reference code block, and add the target mark information for the target code block according to the matching code block. By determining the SDK to which the target code block belongs, the application integrated with the SDK can be identified without extracting the characteristics of the SDK, so that the type of the SDK included in the application is detected rapidly, and the efficiency of application detection is improved effectively.
Based on the above description of the embodiments of the application detection method, the embodiments of the present application also disclose an application detection apparatus, which may be a computer program (including program code) running in the above mentioned computer device. The application detection means may perform the method shown in fig. 3 or fig. 5. Referring to fig. 8, the application detection apparatus may operate the following units:
a dividing unit 801 for dividing a target code block from a program code of a target application in response to a code detection operation for the target application;
A calculating unit 802, configured to calculate a code similarity between the target code block and at least one reference code block, where one reference code block corresponds to one software development kit;
A determining unit 803 for determining a matching code block matching the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
the adding unit 804 is configured to add, to the target code block, target flag information according to the matching code block, where the target flag information is used to indicate that the target code block belongs to a software development kit corresponding to the matching code block.
In yet another implementation, the apparatus further includes: an acquisition unit 805 in which:
The acquiring unit 805 is configured to acquire, for any reference code block, a characteristic parameter of the target code block and a characteristic parameter of the any reference code block;
The calculating unit 802 is configured to perform similarity calculation on the feature parameter of the target code block and the feature parameter of any one of the reference code blocks, so as to obtain a code similarity between the target code block and any one of the reference code blocks.
In yet another implementation, the characteristic parameter is an operation instruction, and the operation instruction includes a plurality of target instruction fragments; the obtaining unit 805 is specifically configured to:
Obtaining at least one class from the target code block, each class comprising at least one method and at least one variable;
Performing instruction extraction processing on each method in each class to obtain a plurality of candidate instruction fragments;
and selecting a plurality of target instruction fragments from the plurality of candidate instruction fragments.
In yet another implementation, the determining unit 803 is configured to: determining the instruction length of each candidate instruction segment;
The acquisition unit is used for selecting a candidate instruction fragment with the instruction length larger than a length threshold value from the plurality of candidate instruction fragments as a target instruction fragment.
In yet another implementation manner, the characteristic parameter is an operation instruction, and the calculating unit 802 is specifically configured to:
performing hash operation on the operation instruction of the target code block by adopting a target hash algorithm to obtain a target hash value; performing hash operation on the operation instruction of any reference code block by adopting the target hash algorithm to obtain a reference hash value;
and performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks.
In yet another implementation, the operation instruction of the target code block includes a plurality of target instruction fragments, and the target hash algorithm is a local sensitive hash algorithm; the calculating unit 802 is specifically configured to:
performing hash operation on each target instruction fragment in the target instruction fragments respectively to obtain a hash value set, wherein the hash value set comprises hash values of each target instruction fragment;
Determining the weight value of each hash value according to the occurrence times of each hash value in the hash value set;
and weighting and summing the hash values by adopting the weight value of each hash value to obtain a target hash value.
In yet another implementation manner, the determining unit 803 is specifically configured to:
The number of times each hash value appears in the hash value set is used as a weight value of each hash value; or alternatively
Determining the repeatability of each hash value in the hash value set according to the occurrence times of each hash value in the hash value set; and taking the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value.
In yet another implementation manner, the dividing unit 801 is specifically configured to:
In response to a code detection operation for a target application, acquiring a code file of the target application, wherein the code file comprises program codes of the target application;
decompiling the code file to obtain a hierarchical structure of the program code;
code cutting is carried out on the program codes of the target application according to the hierarchical structure, and at least one code block is obtained;
and selecting an object code block from the at least one code block.
In yet another implementation, the at least one reference code block is stored in a software development kit feature library; the apparatus further comprises an analysis unit 806, wherein:
the obtaining unit 805 is further configured to: acquiring program codes of at least two reference applications, and dividing the program codes of the at least two reference applications into a plurality of reference code blocks;
The analysis unit 806 is configured to perform cluster analysis on the plurality of reference code blocks, and perform grouping processing on the plurality of reference code blocks according to an analysis result; the reference code blocks in a group belong to the same software development kit;
The obtaining unit 805 is further configured to obtain, according to the group information of each reference code block, flag information of each reference code block; the marking information of any reference code block is generated after feature joint analysis is carried out on each reference code block in the group to which the any reference code block belongs so as to determine a software development kit to which the any reference code block belongs;
the adding unit 804 is further configured to add the each reference code block and the corresponding marking information to a feature library of the software development kit.
According to one embodiment of the application, the steps involved in the method of fig. 3 or 5 may be performed by the units of the application detection apparatus of fig. 8. For example, step S301 shown in fig. 3 is performed by the dividing unit 801 shown in fig. 8, step S302 is performed by the calculating unit 802 shown in fig. 8, step S303 is performed by the determining unit 803 shown in fig. 8, and step S304 is performed by the adding unit 804 shown in fig. 8. As another example, steps 501 to S504 shown in fig. 5 are performed by the dividing unit 801 shown in fig. 8, step S505 is performed by the calculating unit 802 shown in fig. 8, step S506 is performed by the determining unit 803 shown in fig. 8, and step S507 is performed by the adding unit 804 shown in fig. 8.
According to another embodiment of the present application, each unit in the application detection apparatus shown in fig. 8 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the application-based detection apparatus may also include other units, and in actual applications, these functions may also be implemented with assistance from other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, the processing elements and the storage elements may be implemented by including a central processing unit (Central Processing Unit, CPU), a random access storage medium (RAM), a read only storage medium (ROM), or the like. A general-purpose computing device such as a computer runs a computer program (including program code) capable of executing steps involved in the respective methods as shown in fig. 3 or 5 to construct an application detection apparatus as shown in fig. 8, and to implement the application detection method of the embodiment of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and run in the above-described computer apparatus via the computer-readable recording medium.
In the embodiment of the invention, the computer equipment responds to the code detection operation aiming at the target application, cuts out target code blocks from the program codes of the target application, calculates the code similarity between the target code blocks and at least one reference code block, and then determines a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code blocks and each reference code block; and adding target marking information for the target code block according to the matched code block. The computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet or a jar packet of the SDK, and can rapidly detect the SDK integrated in the target application by cutting a program code of the target application and analyzing the similarity of a code block obtained by cutting and a reference code block, thereby effectively improving the application detection efficiency.
Based on the above description of the application detection method embodiment, the embodiment of the present application also discloses a computer device, please refer to fig. 9, which at least may include a processor 901, an input device 902, an output device 903, and a computer storage medium 904. Wherein the processor 901, input devices 902, output devices 903, and computer storage media 904 within a computer device may be connected by a bus or other means.
The computer storage medium 904 is a memory device in a computer device for storing programs and data. It is understood that the computer storage media 904 herein may include both built-in storage media for computer devices and extended storage media supported by computer devices. The computer storage media 904 provides storage space that stores the operating system of the computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 901. Note that the computer storage medium herein may be a high-speed RAM memory; optionally, the computer storage medium may be at least one computer storage medium remote from the foregoing processor, where the processor may be referred to as a central processing unit (Central Processing Unit, CPU), and is a core of a computer device and a control center, and is adapted to be implemented with one or more instructions, and specifically load and execute the one or more instructions to implement a corresponding method flow or function.
In a possible embodiment, the processor 901 may load and execute one or more first instructions stored in a computer storage medium to implement the corresponding steps of the method described above in relation to the application detection method embodiment; in particular implementations, one or more first instructions in a computer storage medium are loaded by processor 901 and perform the following:
in response to a code detection operation for the target application, partitioning a target code block from program code of the target application;
Calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit;
Determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block;
And adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
In yet another implementation, the processor 901 is specifically configured to:
for any reference code block, acquiring the characteristic parameters of the target code block and the characteristic parameters of any reference code block;
And performing similarity calculation on the characteristic parameters of the target code block and the characteristic parameters of any one of the reference code blocks to obtain the code similarity between the target code block and any one of the reference code blocks.
In yet another implementation, the characteristic parameter is an operation instruction, and the operation instruction includes a plurality of target instruction fragments; the processor 901 is specifically configured to:
Obtaining at least one class from the target code block, each class comprising at least one method and at least one variable;
Performing instruction extraction processing on each method in each class to obtain a plurality of candidate instruction fragments;
and selecting a plurality of target instruction fragments from the plurality of candidate instruction fragments.
In yet another implementation, the processor 901 is specifically configured to:
Determining the instruction length of each candidate instruction segment;
and selecting a candidate instruction segment with the instruction length larger than a length threshold value from the plurality of candidate instruction segments as a target instruction segment.
In yet another implementation, the characteristic parameter is an operation instruction, and the processor 901 is specifically configured to:
performing hash operation on the operation instruction of the target code block by adopting a target hash algorithm to obtain a target hash value; performing hash operation on the operation instruction of any reference code block by adopting the target hash algorithm to obtain a reference hash value;
and performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks.
In yet another implementation, the operation instruction of the target code block includes a plurality of target instruction fragments, and the target hash algorithm is a local sensitive hash algorithm; the processor 901 is specifically configured to:
performing hash operation on each target instruction fragment in the target instruction fragments respectively to obtain a hash value set, wherein the hash value set comprises hash values of each target instruction fragment;
Determining the weight value of each hash value according to the occurrence times of each hash value in the hash value set;
and weighting and summing the hash values by adopting the weight value of each hash value to obtain a target hash value.
In yet another implementation, the processor 901 is specifically configured to:
The number of times each hash value appears in the hash value set is used as a weight value of each hash value; or alternatively
Determining the repeatability of each hash value in the hash value set according to the occurrence times of each hash value in the hash value set; and taking the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value.
In yet another implementation, the processor 901 is specifically configured to:
In response to a code detection operation for a target application, acquiring a code file of the target application, wherein the code file comprises program codes of the target application;
decompiling the code file to obtain a hierarchical structure of the program code;
code cutting is carried out on the program codes of the target application according to the hierarchical structure, and at least one code block is obtained;
and selecting an object code block from the at least one code block.
In yet another implementation, the at least one reference code block is stored in a software development kit feature library; the processor 901 is further configured to:
Acquiring program codes of at least two reference applications, and dividing the program codes of the at least two reference applications into a plurality of reference code blocks;
Performing cluster analysis on the plurality of reference code blocks, and performing grouping processing on the plurality of reference code blocks according to analysis results; the reference code blocks in a group belong to the same software development kit;
acquiring the marking information of each reference code block according to the group information of each reference code block; the marking information of any reference code block is generated after feature joint analysis is carried out on each reference code block in the group to which the any reference code block belongs so as to determine a software development kit to which the any reference code block belongs;
and adding each reference code block and corresponding marking information to a software development kit feature library.
In the embodiment of the invention, the computer equipment responds to the code detection operation aiming at the target application, cuts out target code blocks from the program codes of the target application, calculates the code similarity between the target code blocks and at least one reference code block, and then determines a matching code block matched with the target code block from at least one reference code block according to the code similarity between the target code blocks and each reference code block; and adding target marking information for the target code block according to the matched code block. The computer equipment does not need to extract the characteristics of the SDK by acquiring a source packet or a jar packet of the SDK, and can rapidly detect the SDK integrated in the target application by cutting a program code of the target application and analyzing the similarity of a code block obtained by cutting and a reference code block, thereby effectively improving the application detection efficiency.
It should be noted that the embodiments of the present application also provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps performed in fig. 3 or fig. 5 of the above-described application detection method embodiment.
The above disclosure is only a preferred embodiment of the present invention, and it should be understood that the scope of the invention is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present invention.
Claims (9)
1. An application detection method, comprising:
In response to a code detection operation for a target application, acquiring a code file of the target application, wherein the code file comprises program codes of the target application;
decompiling the code file to obtain a hierarchical structure of the program code;
determining a starting hierarchy to be traversed from the hierarchy structure, and traversing each code packet identifier in the starting hierarchy;
If the currently traversed object code packet identifier is in a preset cutting list, cutting codes corresponding to the code packet identifiers in a hierarchy of the next layer of the initial hierarchy into different code blocks;
If the target code packet identifier is not in the cut list, judging whether a first parent code packet identifier corresponding to the target code packet identifier in a first level is positioned in the cut list, wherein the first level is a level positioned on the upper layer of the initial level;
If the first parent code packet identifier is located in the cutting list, cutting a code corresponding to the target code packet identifier into a code block;
if the first parent code package identifier is not located in the cutting list, searching parent code package identifiers meeting the cutting list in the hierarchical structure according to the sequence from bottom to top based on the first hierarchy, and cutting codes corresponding to the code package identifiers in the hierarchy of the next layer of the target hierarchy into different code blocks when the parent code package identifier corresponding to the target hierarchy meets the cutting list;
Selecting an object code block from at least one code block obtained by cutting;
calculating the code similarity between the target code block and at least one reference code block, wherein one reference code block corresponds to one software development kit, and the at least one reference code block is stored in a software development kit feature library;
Determining a matching code block matched with the target code block from the at least one reference code block according to the code similarity between the target code block and each reference code block, wherein the matching code block refers to the reference code block with the maximum code similarity with the target code block;
And adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
2. The method of claim 1, wherein the calculating the code similarity between the target code block and at least one reference code block comprises:
for any reference code block, acquiring the characteristic parameters of the target code block and the characteristic parameters of any reference code block;
And performing similarity calculation on the characteristic parameters of the target code block and the characteristic parameters of any one of the reference code blocks to obtain the code similarity between the target code block and any one of the reference code blocks.
3. The method of claim 2, wherein the characteristic parameter is an operation instruction, the operation instruction comprising a plurality of target instruction fragments; the obtaining the characteristic parameters of the target code block includes:
Obtaining at least one class from the target code block, each class comprising at least one method and at least one variable;
Performing instruction extraction processing on each method in each class to obtain a plurality of candidate instruction fragments;
and selecting a plurality of target instruction fragments from the plurality of candidate instruction fragments.
4. The method of claim 3, wherein selecting a plurality of target instruction fragments from the plurality of candidate instruction fragments comprises:
Determining the instruction length of each candidate instruction segment;
and selecting a candidate instruction segment with the instruction length larger than a length threshold value from the plurality of candidate instruction segments as a target instruction segment.
5. The method according to any one of claims 2-4, wherein the characteristic parameter is an operation instruction, and the performing feature similarity calculation on the characteristic parameter of the target code block and the characteristic parameter of any one of the reference code blocks to obtain a code similarity between the target code block and any one of the reference code blocks includes:
performing hash operation on the operation instruction of the target code block by adopting a target hash algorithm to obtain a target hash value; performing hash operation on the operation instruction of any reference code block by adopting the target hash algorithm to obtain a reference hash value;
and performing distance operation on the target hash value and the reference hash value to obtain the code similarity between the target code block and any one of the reference code blocks.
6. The method of claim 5, wherein the operation instructions of the target code block comprise a plurality of target instruction fragments, the target hash algorithm being a locality sensitive hash algorithm; performing hash operation on the operation instruction of the target code block by using a target hash algorithm to obtain a target hash value, including:
performing hash operation on each target instruction fragment in the target instruction fragments respectively to obtain a hash value set, wherein the hash value set comprises hash values of each target instruction fragment;
Determining the weight value of each hash value according to the occurrence times of each hash value in the hash value set;
and weighting and summing the hash values by adopting the weight value of each hash value to obtain a target hash value.
7. The method of claim 6, wherein the determining the weight value for each hash value based on the number of times each hash value occurs in the set of hash values comprises:
The number of times each hash value appears in the hash value set is used as a weight value of each hash value; or alternatively
Determining the repeatability of each hash value in the hash value set according to the occurrence times of each hash value in the hash value set; and taking the weight value corresponding to the repetition degree corresponding to each hash value as the weight value of each hash value.
8. The method of claim 1, wherein the method further comprises:
Acquiring program codes of at least two reference applications, and dividing the program codes of the at least two reference applications into a plurality of reference code blocks;
Performing cluster analysis on the plurality of reference code blocks, and performing grouping processing on the plurality of reference code blocks according to analysis results; the reference code blocks in a group belong to the same software development kit;
acquiring the marking information of each reference code block according to the group information of each reference code block; the marking information of any reference code block is generated after feature joint analysis is carried out on each reference code block in the group to which the any reference code block belongs so as to determine a software development kit to which the any reference code block belongs;
and adding each reference code block and corresponding marking information to a software development kit feature library.
9. An application detection apparatus, comprising:
a segmentation unit, configured to obtain a code file of a target application in response to a code detection operation for the target application, where the code file includes program codes of the target application;
The segmentation unit is further used for decompiling the code file to obtain a hierarchical structure of the program code;
determining a starting hierarchy to be traversed from the hierarchy structure, and traversing each code packet identifier in the starting hierarchy;
If the currently traversed object code packet identifier is in a preset cutting list, cutting codes corresponding to the code packet identifiers in a hierarchy of the next layer of the initial hierarchy into different code blocks;
If the target code packet identifier is not in the cut list, judging whether a first parent code packet identifier corresponding to the target code packet identifier in a first level is positioned in the cut list, wherein the first level is a level positioned on the upper layer of the initial level;
If the first parent code packet identifier is located in the cutting list, cutting a code corresponding to the target code packet identifier into a code block;
if the first parent code package identifier is not located in the cutting list, searching parent code package identifiers meeting the cutting list in the hierarchical structure according to the sequence from bottom to top based on the first hierarchy, and cutting codes corresponding to the code package identifiers in the hierarchy of the next layer of the target hierarchy into different code blocks when the parent code package identifier corresponding to the target hierarchy meets the cutting list;
Selecting an object code block from at least one code block obtained by cutting;
A computing unit, configured to compute a code similarity between the target code block and at least one reference code block, where one reference code block corresponds to one software development kit, and the at least one reference code block is stored in a software development kit feature library;
A determining unit, configured to determine, from the at least one reference code block, a matching code block that matches the target code block according to a code similarity between the target code block and each reference code block, where the matching code block is a reference code block with a maximum code similarity with the target code block;
And the adding unit is used for adding target mark information for the target code block according to the matched code block, wherein the target mark information is used for indicating that the target code block belongs to a software development kit corresponding to the matched code block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011171532.3A CN112148305B (en) | 2020-10-28 | 2020-10-28 | Application detection method, device, computer equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011171532.3A CN112148305B (en) | 2020-10-28 | 2020-10-28 | Application detection method, device, computer equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112148305A CN112148305A (en) | 2020-12-29 |
CN112148305B true CN112148305B (en) | 2024-09-10 |
Family
ID=73953484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011171532.3A Active CN112148305B (en) | 2020-10-28 | 2020-10-28 | Application detection method, device, computer equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112148305B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112685080B (en) * | 2021-01-08 | 2023-08-11 | 深圳开源互联网安全技术有限公司 | Open source component duplicate checking method, system, device and readable storage medium |
CN112732581B (en) * | 2021-01-12 | 2023-03-10 | 京东科技控股股份有限公司 | SDK detection method, device, electronic equipment, system and storage medium |
CN115146264B (en) * | 2021-03-31 | 2024-11-12 | 中国电信股份有限公司 | Application program processing method and device |
CN113805892B (en) * | 2021-09-17 | 2024-04-05 | 杭州云深科技有限公司 | Abnormal APK identification method, electronic equipment and readable storage medium |
CN114416600B (en) * | 2022-03-29 | 2022-06-28 | 腾讯科技(深圳)有限公司 | Application detection method and device, computer equipment and storage medium |
CN115460274B (en) * | 2022-09-13 | 2024-12-24 | 招商银行股份有限公司 | Method, device, apparatus and computer-readable storage medium for discovering third-party SDK |
CN118427635A (en) * | 2024-05-22 | 2024-08-02 | 北京百度网讯科技有限公司 | Application processing method and device, electronic equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9471285B1 (en) * | 2015-07-09 | 2016-10-18 | Synopsys, Inc. | Identifying software components in a software codebase |
CN106803040A (en) * | 2017-01-18 | 2017-06-06 | 腾讯科技(深圳)有限公司 | Virus signature processing method and processing device |
CN110175045A (en) * | 2019-05-20 | 2019-08-27 | 北京邮电大学 | Android application program beats again bag data processing method and processing device |
CN111338622A (en) * | 2020-05-15 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Supply chain code identification method, device, server and readable storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DK3011442T3 (en) * | 2013-06-18 | 2021-01-04 | Ciambella Ltd | METHOD AND DEVICE FOR GENERATING A CUSTOM SOFTWARE DEVELOPMENT KIT (SDK) |
US20170242671A1 (en) * | 2016-02-18 | 2017-08-24 | Qualcomm Innovation Center, Inc. | Semantically sensitive code region hash calculation for programming languages |
KR20160100887A (en) * | 2016-08-12 | 2016-08-24 | 충남대학교산학협력단 | Method for detecting malware by code block comparison |
US10048945B1 (en) * | 2017-05-25 | 2018-08-14 | Devfactory Fz-Llc | Library suggestion engine |
CN109710299A (en) * | 2018-12-14 | 2019-05-03 | 平安普惠企业管理有限公司 | An open source class library monitoring method, device, device and computer storage medium |
US11269601B2 (en) * | 2019-06-27 | 2022-03-08 | Intel Corporation | Internet-based machine programming |
CN111190603B (en) * | 2019-12-18 | 2021-07-06 | 腾讯科技(深圳)有限公司 | Private data detection method and device and computer readable storage medium |
-
2020
- 2020-10-28 CN CN202011171532.3A patent/CN112148305B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9471285B1 (en) * | 2015-07-09 | 2016-10-18 | Synopsys, Inc. | Identifying software components in a software codebase |
CN106803040A (en) * | 2017-01-18 | 2017-06-06 | 腾讯科技(深圳)有限公司 | Virus signature processing method and processing device |
CN110175045A (en) * | 2019-05-20 | 2019-08-27 | 北京邮电大学 | Android application program beats again bag data processing method and processing device |
CN111338622A (en) * | 2020-05-15 | 2020-06-26 | 支付宝(杭州)信息技术有限公司 | Supply chain code identification method, device, server and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112148305A (en) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112148305B (en) | Application detection method, device, computer equipment and readable storage medium | |
CN110020422B (en) | Feature word determining method and device and server | |
CN110175851B (en) | Cheating behavior detection method and device | |
US20240330514A1 (en) | Automated data masking with false positive detection and avoidance | |
CN111090807B (en) | Knowledge graph-based user identification method and device | |
CN111723371B (en) | Method for constructing malicious file detection model and detecting malicious file | |
CN112817877A (en) | Abnormal script detection method and device, computer equipment and storage medium | |
CN113221032A (en) | Link risk detection method, device and storage medium | |
CN115632874A (en) | Method, device, equipment and storage medium for detecting threat of entity object | |
CN117312825A (en) | Target behavior detection method and device, electronic equipment and storage medium | |
CN110990834B (en) | Static detection method, system and medium for android malicious software | |
CN115859273A (en) | Method, device and equipment for detecting abnormal access of database and storage medium | |
CN114266046B (en) | Network virus identification method, device, computer equipment and storage medium | |
CN106301979A (en) | The method and system of the abnormal channel of detection | |
CN116015703A (en) | Model training method, attack detection method and related devices | |
CN111143833B (en) | Illegal application program category identification method and device | |
CN113676480B (en) | Equipment fingerprint tampering detection method and device | |
CN112632541B (en) | Method, device, computer equipment and storage medium for determining malicious degree of behavior | |
CN114124913B (en) | Method and device for monitoring network asset change and electronic equipment | |
CN116192462A (en) | Malicious software analysis method and device based on PE file format | |
Ogwara et al. | MOBDroid: An intelligent malware detection system for improved data security in mobile cloud computing environments | |
CN114266045A (en) | Network virus identification method and device, computer equipment and storage medium | |
CN113111147A (en) | Text type identification method and device, electronic equipment and storage medium | |
Xu et al. | MFF-AMD: multivariate feature fusion for android malware detection | |
CN113315790B (en) | Intrusion flow detection method, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |