CN114065308B

CN114065308B - Gate-level hardware Trojan horse positioning method and system based on deep learning

Info

Publication number: CN114065308B
Application number: CN202111412498.9A
Authority: CN
Inventors: 董晨; 张媛媛; 许熠; 黄槟鸿; 黄小刚
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2024-08-02
Anticipated expiration: 2041-11-25
Also published as: CN114065308A

Abstract

The invention relates to a door-level hardware Trojan horse positioning method and a system based on deep learning, wherein the method firstly obtains seven public door-level netlist files to obtain a training set and a testing set; preprocessing, converting the netlist file into a path statement by using a depth-first search algorithm, and completing path generation; then construct and train TextCNN models for detection and localization; inputting the path set of the test set into a model to obtain a pre-detection result; carrying out path division and constructing virtual positioning coordinates on the pre-detection result to obtain a short path set SL for positioning; and finally inputting SL into TextCNN model to obtain positioning result P. The invention enables a fast and efficient evaluation of the security performance of an integrated circuit even if a threat is found and targeted.

Description

Gate-level hardware Trojan horse positioning method and system based on deep learning

Technical Field

The invention relates to the fields of computer hardware protection and system-on-chip safety, in particular to a door-level hardware Trojan horse positioning method and system based on deep learning.

Background

Integrated Circuits (ICs) are the core components that make up computer hardware and are very complex in design and manufacturing processes. To reduce costs, many manufacturers choose to outsource a part of the IC manufacturing process, i.e. the so-called third party vendor, which undoubtedly introduces a significant security threat to hardware security. A Hardware Trojan (HT) is a small piece of circuitry that an attacker inserts in the original IC layout in order to achieve some malicious purpose. HT can be inserted at any stage of IC fabrication, and the security threats posed include changing circuit functions, causing information leakage, rejecting services, etc. Currently, research on HT detection can be roughly divided into pre-silicon detection, which is performed before the IC chip is finished, and post-silicon detection, which is performed after the IC chip is finished. Obviously, pre-silicon detection can reduce cost more, thereby achieving win-win between safety and profit. The pre-silicon test is mainly performed in the design stage of the IC, and the gate level is the last stage of the design stage, so that the test HT is very effective at the gate level.

In an IC design, the division is at an abstract level, in order from high to low: system level, algorithm level, register transfer level, gate level, transistor level. The gate level detection is a common static detection method, and a new Trojan horse detection method is explored through the logic structure of a gate level netlist analysis circuit. The key to detecting HT at the gate level is to obtain a netlist file describing the level, i.e. the gate level netlist. The gate level netlist is used to describe the interconnection between circuit elements that contain logic gates or other elements at the same level as the logic gates. To date, many efforts have been made to provide methods for preventing and detecting HT at the gate level. The most commonly used method is to utilize a gate-level netlist to mine the features of HT, and then input a deep learning model to perform feature learning, so that HT is effectively detected. Numerous studies have achieved considerable results, but merely staying in the detection phase is not truly resistant to HT, finding specific locations of HT is a prerequisite for more accurate resistance to them, however, studies for locating HT correlations remain very rare.

Disclosure of Invention

In view of the above, the present invention aims to provide a method and a system for locating a door-level hardware Trojan based on deep learning, which can realize the locating of the hardware Trojan on the door level.

In order to achieve the above purpose, the invention adopts the following technical scheme:

A door-level hardware Trojan horse positioning method based on deep learning comprises the following steps:

Step A: obtaining seven open gate netlist files, and dividing a data set by using a leave-one-out method to obtain a training set Tr and a testing set Ts;

And (B) step (B): preprocessing the gate-level netlist files of the training set Tr and the test set Ts obtained in the step A, and combining a depth-first search algorithm to obtain a training set Tr path set And path set of test set Ts

Step C: constructing and initializing TextCNN model for detecting and locating HT and based on path set of training set Tr obtained in step BTraining;

step D: b, collecting paths of the test set Ts obtained in the step B Inputting the TextCNN model trained in the step C to obtain a pre-detection result;

Step E: performing path division and constructing virtual positioning coordinates on the pre-detection result obtained in the step D to obtain a short path set SL for positioning;

step F: inputting the short path set SL obtained in the step E into the TextCNN model trained in the step D to obtain a positioning result P.

Further, the step B specifically includes the following steps:

Step B1: traversing the netlist file by using a depth-first search algorithm, and taking a wire net as an intermediary to obtain a tree graph G representing the interconnection relation of different logic gates;

step B2: based on the tree graph G obtained in the step B1, the condition of a real circuit can be restored, a plurality of non-label paths can be obtained, and then the non-label paths are combined into a non-label path set of the netlist;

Step B3: b1 and B2 are carried out on the gate-level netlist files of the training set Tr and the testing set Ts obtained in the step A, and finally a label-free path set of the training set Tr and the testing set Ts is obtained And

Step B4: based on the information of the gate-level netlists of the training set Tr and the testing set Ts obtained in the step A, labeling the label-free path obtained in the step B3 to obtain a labeled path set of the training set Tr and the testing set TsAnd

Further, the step C specifically includes:

Step C1: path set of training set Tr obtained in step B Generating a vocabulary for TextCNN model extraction features;

step C2: constructing and initializing TextCNN models;

step C3: path set based on training set Tr obtained in step B The TextCNN model can learn the characteristics of the paths with Trojan and the paths without Trojan respectively, and the training of the model is completed.

Further, the step C1 specifically includes:

Step C11: firstly, the path set of the training set Tr obtained in the step B is collected Converting into text content;

step C12: based on the text content obtained in the step C11, reading the words one by one and calculating the frequency of each word;

step C13: marking sequence numbers for each word from high to low according to the occurrence frequency of the word, and finishing the vectorization representation of the word;

step C14: and packaging the words and the corresponding sequence numbers into dictionary types, writing the dictionary types into a vocabulary file, and completing the generation of a vocabulary.

Further, the step D specifically includes:

Step D1: c, based on the TextCNN model trained in the step C, adding a storage operation for the last full-connection layer of the model, so as to record a pre-detection result conveniently;

Step D2: aggregating paths of a test set Inputting the TextCNN model trained in the step C to obtain a primary measurement result set { P _TP,P_FP,P_TN,P_FN }, wherein P _TP is a set of Trojan paths which are identified correctly, P _FP is a set of Trojan paths which are identified as Trojan, P _TN is a set of Trojan paths which are identified correctly, and P _FN is a set of Trojan paths which are identified as Trojan paths;

Step D3: based on the initial detection result set { P _TP,P_FP,P_TN,P_FN } obtained in step D2, a set P _TP of Trojan paths in which the Trojan paths are correctly identified is selected as a pre-detection result.

Further, the step E specifically includes:

Step E1: numbering paths in the pre-detection result obtained in the step D to obtain an original long path set ll= { LL _i |i=1, & gt, TP }, where TP is the number of paths contained in the set P _TP of correctly identified Trojan paths obtained in the step D2;

Step E2: setting the divided length cutlen, sequentially dividing the long path LL _i into a group according to cutlen logic gates to obtain a plurality of short paths and setting virtual positioning coordinates for the short paths;

Step E3: and E2, executing the operation of the step E1 on each of the original long path sets LL to obtain a short path set SL and a virtual positioning coordinate set, and completing path division and constructing virtual positioning coordinates.

Further, the step E2 specifically includes:

Step E21: setting the length cutlen of the division;

Step E22: for the long path LL _i, the number of short paths num _i that can be generated after it is divided is calculated as follows:

Where length _i denotes the length of long path LL _i;

Step E23: dividing long path LL _i into a group according to cutlen logic gates in turn to obtain a plurality of short paths Where j is an index of the short path, indicating that the j-th short path is divided from the long path LL _L;

Step E24: according to the results of the step E22 and the step E23, the path is a short path Setting virtual positioning coordinatesTo record possible Trojan horse positions, wherein AndThe calculation formula of (2) is as follows:

Where t _i is the t-th division of the original long path LL _i;

Step E25: and E24, repeating the step, and finishing the setting of virtual positioning coordinates of num _i short paths.

Further, the step F specifically includes:

step F1: one path in the short path set SL Inputting the TextCNN model trained in the step D, and predicting the class result of the TextCNN model;

step F2: if the predicted result output by TextCNN model is Trojan path, the corresponding virtual positioning coordinate is obtained Recording the positioning result P;

step F3: and F1 and F2 are repeated until all short paths execute the operation, and a final positioning result P is output to finish positioning.

A deep learning based door level hardware Trojan positioning system comprising:

The path generation module is used for generating path sentences representing circuit wires and comprises a searching sub-module, a temporary path sub-module and a label sub-module; firstly, preprocessing gate-level netlist files of an input training set Tr and a test set Ts, performing depth-first search on the gate-level netlist files through a searching submodule to obtain a tree graph G representing interconnection relations of different logic gates, and then generating a label-free path set of the training set Tr and the test set Ts through a temporary path submodule AndFinally, labeling the label-free path through a labeling sub-module to generate a labeled path set of a training set Tr and a test set TsAnd

The model generation module is used for constructing and training TextCNN models and comprises a vectorization sub-module, a model construction sub-module and a model training sub-module; first, path set of training set Tr generated for label moduleGenerating vocabulary files through a vectorization sub-module, constructing and initializing TextCNN models through a model constructing sub-module, and finally gathering paths through a model training sub-moduleInputting a model and finishing training of the model;

The pre-detection module is used for obtaining a pre-detection result of the test set Ts and comprises an enrichment sub-module, a pre-detection sub-module and an output sub-module; firstly, adding a storage operation to the last full-connection layer of TextCNN models constructed by a model construction submodule through a storage increasing submodule so as to record a pre-detection result, and then, integrating paths through the pre-detection submodule Pre-detecting paths in the database to obtain an initial detection result set { P _TP,P_FP,P_TN,P_FN }, and finally taking the set P _TP of the Trojan paths which are correctly identified as a pre-detection result to be output by an output submodule;

The path dividing module is used for dividing the result path output by the output module into short paths and reducing the positioning range, and comprises a sequencing sub-module, a dividing sub-module and a quasi-coordinate sub-module; for paths in the pre-detection result P _TP output by the output module, numbering is carried out through the sequencing submodule, the paths are divided into a plurality of short paths through the dividing submodule, and finally a virtual positioning coordinate is set for each short path through the quasi-coordinate submodule;

The positioning module is used for completing the positioning of the Trojan and comprises a loading sub-module and an output sub-module; firstly, loading a TextCNN model trained by a short path to a model generation module by a loading sub-module, selecting a path predicted to be a Trojan horse after a predicted result passes through an output sub-module, outputting corresponding virtual positioning coordinates, and completing positioning.

Compared with the prior art, the invention has the following beneficial effects:

1. The invention realizes the detection of the hardware Trojan by utilizing the application of the convolutional neural network in text classification;

2. The method converts the detection problem of the hardware Trojan into two classification problems, so that the convolutional neural network learns the context characteristics of the circuit path statement, and autonomously discovers the characteristics of the Trojan path and the Trojan-free path, thereby classifying. Then on the basis of detection, the positioning of the hardware Trojan is explored, the path segmentation technology is considered to be applied to the positioning problem, and the positioning range of the hardware Trojan is reduced by dividing a long path in the circuit into a plurality of short paths;

3. The invention can realize further positioning work on the basis of detection, breaks through the situation that the positioning hardware Trojan is coarsely manufactured from the image of the integrated circuit in the past, can realize positioning on the gate level, and can resist the hardware Trojan more effectively from the design stage of the integrated circuit;

4. The invention can be used in an integrated circuit security detection system for evaluating the security performance of an integrated circuit and even if a threat is found and targeted, for a designer to take action against the threat, etc.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a system according to an embodiment of the invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

Referring to fig. 1, the invention provides a door-level hardware Trojan horse positioning method based on deep learning, which comprises the following steps:

step A: firstly, seven open gate-level netlist files are obtained, and a leave-one-out method is used for dividing a data set to obtain a training set Tr and a testing set Ts;

and (B) step (B): preprocessing the gate-level netlist file of the training set Tr and the testing set Ts obtained in the step A, and combining a depth-first search algorithm to obtain a path set of the training set Tr and the testing set Ts AndCompleting generation of a path;

Step B1: traversing the netlist file by using a depth-first search algorithm, taking a wire net as an intermediary, and obtaining a tree graph G representing the interconnection relation of different logic gates;

Step B4: based on the information of the gate-level netlists of the training set Tr and the testing set Ts obtained in the step A, the label-free path obtained in the step B3 is labeled, and a labeled path set of the training set Tr and the testing set Ts is obtainedAnd

Step C: constructing and initializing TextCNN model for detecting and locating HT, and inputting path set of training set Tr obtained in step BCompleting the construction and training of a model;

Step C12: based on the text content obtained in the step C1, reading the words one by one and calculating the frequency of each word;

Step C2: constructing and initializing TextCNN models;

Step D: b, collecting paths of the test set Ts obtained in the step BInputting the TextCNN model trained in the step C to obtain a pre-detection result;

Step D2: aggregating paths of a test set Inputting the TextCNN model trained in the step C, and obtaining a primary measurement result set { P _TP,P_FP,P_TN,P_FN }, wherein P _TP is a set of Trojan paths which are identified correctly, P _FP is a set of Trojan paths which are identified as Trojan, P _TN is a set of Trojan paths which are identified correctly, and P _FN is a set of Trojan paths which are identified as Trojan paths;

Step D3: based on the initial detection result set { P _TP,P_FP,P_TN,P_FN } obtained in step D2, only the set P _TP of Trojan paths, which are correctly identified therein, is selected as a pre-detection result for subsequent positioning.

Step E21: setting the length cutlen of the division;

Where length _i denotes the length of long path LL _i;

Step E23: dividing long path LL _i into a group according to cutlen logic gates in turn to obtain a plurality of short paths Where j is an index of the short path, indicating that the j-th short path is divided from the long path LL _i;

Where t _i is the t-th division of the original long path LL _i;

Step F1: one path in the short path set SLInputting the TextCNN model trained in the step D, and predicting the class result of the TextCNN model;

The invention also provides a door-level hardware Trojan horse positioning system based on deep learning, which comprises:

The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. The door-level hardware Trojan horse positioning method based on deep learning is characterized by comprising the following steps of:

step D: b, collecting paths of the test set Ts obtained in the step B Inputting the TextCNN model trained in the step C;

2. The method for positioning the door-level hardware Trojan horse based on deep learning according to claim 1, wherein the step B is specifically as follows:

Step B2: based on the tree diagram G obtained in the step B1, the condition of a real circuit is restored, a plurality of non-label paths are obtained, and then the non-label paths are combined into a non-label path set of the netlist;

Step B4: based on the information of the gate-level netlists of the training set Tr and the testing set Ts obtained in the step A, performing label distribution on the label-free path obtained in the step B3 to obtain a labeled path set of the training set Tr and the testing set TsAnd

3. The method for positioning a door-level hardware Trojan horse based on deep learning according to claim 1, wherein the step C is specifically:

step C2: constructing and initializing TextCNN models;

step C3: path set based on training set Tr obtained in step B And TextCNN, respectively learning the characteristics of the path with the Trojan and the path without the Trojan by the model to finish training of the model.

4. A door-level hardware Trojan horse positioning method based on deep learning as claimed in claim 3, wherein: the step C1 specifically comprises the following steps:

5. The deep learning-based door level hardware Trojan horse positioning method according to claim 1, wherein the method comprises the following steps: the step D specifically comprises the following steps:

Step D2: aggregating paths of a test set Inputting the set into the TextCNN model trained in the step C to obtain a primary measurement result set { P _TP,P_FP,P_TN,P_FN }, wherein P _TP is a set of Trojan paths which are correctly identified, P _FP is a set of Trojan paths which are identified as Trojan, P _TN is a set of Trojan paths which are correctly identified, and P _FN is a set of Trojan paths which are identified as Trojan paths;

6. The deep learning-based door level hardware Trojan horse positioning method according to claim 1, wherein the method comprises the following steps: the step E specifically comprises the following steps:

7. The deep learning-based door level hardware Trojan positioning method according to claim 6, wherein the method comprises the following steps: the step E2 specifically comprises the following steps:

Step E21: setting the length cutlen of the division;

Where length _i denotes the length of long path LL _i;

Where t _i is the t-th division of the original long path LL _i;

8. The deep learning-based door level hardware Trojan horse positioning method according to claim 1, wherein the method comprises the following steps: the step F specifically comprises the following steps:

9. A door-level hardware Trojan horse positioning system based on deep learning, comprising:

And a path generation module: the path statement generating module is used for generating a path statement representing the circuit wiring and comprises a searching sub-module, a temporary path sub-module and a label sub-module; firstly, preprocessing gate-level netlist files of an input training set Tr and a test set Ts, performing depth-first search on the gate-level netlist files through a searching submodule to obtain a tree graph G representing interconnection relations of different logic gates, and then generating a label-free path set of the training set Tr and the test set Ts through a temporary path submodule AndFinally, label distribution is carried out on the label-free paths through a label sub-module, and a labeled path set of a training set Tr and a test set Ts is generatedAnd

And a model generation module: the model training module is used for constructing and training TextCNN models and comprises a vectorization sub-module, a model construction sub-module and a model training sub-module; first, path set of training set Tr generated for label moduleGenerating vocabulary files through a vectorization sub-module, constructing and initializing TextCNN models through a model constructing sub-module, and finally gathering paths through a model training sub-moduleInputting a model and finishing training of the model;

A pre-detection module: the pre-detection result comprises an increase memory sub-module, a pre-detection sub-module and an output sub-module; firstly, adding a storage operation to the last full-connection layer of TextCNN models constructed by a model construction submodule through a storage increasing submodule so as to record a pre-detection result, and then, integrating paths through the pre-detection submodule Pre-detecting paths in the database to obtain an initial detection result set { P _TP,P_FP,P_TN,P_FN }, and finally taking the set P _TP of the Trojan paths which are correctly identified as a pre-detection result to be output by an output submodule;

And a path dividing module: the device comprises an output module, a positioning module, a short-circuit path dividing module, a positioning module and a coordinate drawing module, wherein the output module is used for outputting a result path to be divided into short-circuit paths, and the positioning module is used for narrowing a positioning range and comprises a sequencing sub-module, a dividing sub-module and a coordinate drawing sub-module; for paths in the pre-detection result P _TP output by the output module, numbering is carried out through the sequencing submodule, the paths are divided into a plurality of short paths through the dividing submodule, and finally a virtual positioning coordinate is set for each short path through the quasi-coordinate submodule;

and a positioning module: finishing the positioning of the Trojan horse, including a loading sub-module and an output sub-module; firstly, loading a TextCNN model trained by a short path to a model generation module by a loading sub-module, selecting a path predicted to be a Trojan horse after a predicted result passes through an output sub-module, outputting corresponding virtual positioning coordinates, and completing positioning.