-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding dependency walker and test cases #56
Conversation
utils/dependency_walker.go
Outdated
if len(ruleset.Rules) == 0 { | ||
return ast.RuleSet{}, fmt.Errorf("ruleset does not contain any rules") | ||
} | ||
// Make sure there are no duplicates in ruleset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary. Duplicate rules are now found during the compilation process and the AST should not contain duplicate rules.
utils/dependency_walker.go
Outdated
|
||
// GetDependencyChainForRules will recursively find dependencies for a list of rules | ||
// returnDepAsPrivate will dictate if dependent rules are marked as private | ||
func GetDependencyChainForRules(ruleset ast.RuleSet, returnDepAsPrivate bool, ruleNames ...string) (ast.RuleSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the intention of GetDependencyChainForRules
is returning the rules in ruleNames
plus any other rules on which they depend on, directly or indirectly. However, the documentation does not mention that the AST nodes returned are not a copy of the original ones, they are simply pointers to the original nodes. This is very important information for the user of this function because modifying any of the nodes in the returned ruleset means that the same node is modified in the input ruleset.
For this same reason I don't think that this function should receive a returnDepAsPrivate
argument. This could be acceptable if the returned AST is a copy, but given that they are not a copy, it means that GetDependencyChainForRules
have the side-effect of modifying the input ruleset. I wouldn't expect such a side effect from this function.
Also, setting those rules as private is a very specific use case and most users of this function won't ever need it. You can always iterate the rules in the resulting ast.RuleSet
and change them to private if you want. It's better to keep the API simple and straight to the point.
utils/dependency_walker.go
Outdated
return final, nil | ||
} | ||
|
||
// OrderRules will take in a ruleset and ensure the order of the rules is correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is not clear enough about what's the correct order. Someone with enough experience in YARA can infer that it means that rule "X" must appear before rule "Y" if "Y" depends on "X", but this should be clearly explained in the function. I would also name it SortRules
, which is less ambiguous, because "order" is both a verb and noun, while "sort" is a verb.
However I think that this function shouldn't be necessary. It may be useful in situations in which you are modifying of building the ast.RuleSet
yourself, but it shouldn't be a necessary step in the implementation of GetDependencyChainForRules
, if you asume the input ast.RuleSet
has the correct order already.
} | ||
|
||
// addNodeChildrenToQue adds a nodes children to the queue | ||
func addNodeChildrenToQue(queue *[]queueT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at this function's signature or documentation is not clear what it does. The name implies that the children of some node are added the queue, but which node? By looking at the implementation is clear that its the children of the first node in the queue, but it requires you to read the function's code.
// Adds node's children to queue.
addChildrenToQueue(node ast.Node, queue *[]queueT)
utils/dependency_walker.go
Outdated
return rs | ||
} | ||
|
||
// GetDependenciesForRules will find all the dependencies (rules & modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for this function could be improved. It's worth mentioning explicitly that the returned ruleset contains only the rules whose identifiers are included in ruleNames
together with their direct and indirect dependencies, and the ruleset's imports will contain only the modules that are used by the returned rules.
utils/dependency_walker.go
Outdated
"github.com/VirusTotal/gyp/ast" | ||
) | ||
|
||
var conditionKeywords = []string{"int8", "int16", "int32", "uint8", "uint16", "uint32", "int8be", "int16be", "int32be", "uint8be", "uint16be", "uint32be"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly for identifying built-in functions like intXX
from other identifiers, so I think it's worth improving the parser by including more information about each identifier in the AST itself. The need for this array comes from a lack of features in the AST. Each ast.Identifier
struct could contain not only the name, but also the kind of identifier if known. During the parsing we are able to know if an identifier is a rule identifier, or a built-in function, for example.
@plusvic thanks for the detailed feedback. I've addressed all of the comments you made in the commits I just added to this PR. Some main changes include:
Let me know if there are any other changes I should make or if I missed anything. Thanks! |
utils/dependency_walker.go
Outdated
} | ||
|
||
// GetDependencyChainForRules will recursively find dependencies for a list of rules | ||
// returnDepAsPrivate will dictate if dependent rules are marked as private |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation still mentions returnDepAsPrivate
which is not an argument anymore.
// GetDependencyChainForRules will recursively find dependencies for a list of rules | ||
// returnDepAsPrivate will dictate if dependent rules are marked as private | ||
func GetDependencyChainForRules(ruleset ast.RuleSet, ruleNames ...string) (ast.RuleSet, error) { | ||
queue := make(map[string]struct{}) // List of rules to get dependencies for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using a map as a queue? It is because you don't want to insert duplicate items in the queue? In that case I would rename queue
to something like pendingRules
, because the name queue
is a bit misleading as this is not a queue actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because you don't want to insert duplicate items in the queue?
Yes I don't want to insert duplicate items - I'll rename this to pendingRules.
utils/dependency_walker.go
Outdated
// GetRuleIdentifiers will find all the identifiers (excluding ForLoop | ||
// variables and Builtin FuncCalls) and the number of times each identifier is | ||
// seen for a given YARA rule | ||
func GetRuleIdentifiers(rule ast.Rule) map[string]int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetRuleIdentifiers
is a confusing name, it seems to suggest that the function is about the identifiers for the rules themselves, but it's about identifiers used by a rule. So, something like GetUsedIdentifiers
is more descriptive.
utils/dependency_walker.go
Outdated
|
||
// GetDependencyChainForRules will recursively find dependencies for a list of rules | ||
// returnDepAsPrivate will dictate if dependent rules are marked as private | ||
func GetDependencyChainForRules(ruleset ast.RuleSet, ruleNames ...string) (ast.RuleSet, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about naming this function GetRulesSubset
? I would say that the main purpose of this function is extracting a subset of rules from a ast.RuleSet
, of course this implies including all the dependencies in the returned ast.RuleSet
, but still the main of this function is returning a subset of rules.
utils/dependency_walker.go
Outdated
return results, nil | ||
} | ||
|
||
// SortRules will take in a ruleset and ensure the order of the rules is correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss a more detailed explanation of what does a "correct" order means. I guess that it means that it guarantees that if rule A depends on rule B, the latter will appear first, but what about rules that don't depend on each other? What's the criteria followed for sorting a ruleset where all rules are completely independent? Are they sorted alphabetically? Do they remain in the same order as in the input ast.RuleSet
? These are questions that come to my mind when I see this function, and they should be addressed in the documentation.
The purpose of this code is to return a list of dependent rules for a given rule. This code takes in a ruleset and a list of rule identifiers to get dependencies for. For each identifier the code will recursively get the the rules that it depends on. Happy to implement any changes/feedback :D
See #48