[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dependency walker and test cases #56

Merged
merged 19 commits into from
Jul 21, 2022

Conversation

gazunder
Copy link
Contributor
@gazunder gazunder commented May 2, 2022

The purpose of this code is to return a list of dependent rules for a given rule. This code takes in a ruleset and a list of rule identifiers to get dependencies for. For each identifier the code will recursively get the the rules that it depends on. Happy to implement any changes/feedback :D

See #48

if len(ruleset.Rules) == 0 {
return ast.RuleSet{}, fmt.Errorf("ruleset does not contain any rules")
}
// Make sure there are no duplicates in ruleset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. Duplicate rules are now found during the compilation process and the AST should not contain duplicate rules.


// GetDependencyChainForRules will recursively find dependencies for a list of rules
// returnDepAsPrivate will dictate if dependent rules are marked as private
func GetDependencyChainForRules(ruleset ast.RuleSet, returnDepAsPrivate bool, ruleNames ...string) (ast.RuleSet, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the intention of GetDependencyChainForRules is returning the rules in ruleNames plus any other rules on which they depend on, directly or indirectly. However, the documentation does not mention that the AST nodes returned are not a copy of the original ones, they are simply pointers to the original nodes. This is very important information for the user of this function because modifying any of the nodes in the returned ruleset means that the same node is modified in the input ruleset.

For this same reason I don't think that this function should receive a returnDepAsPrivate argument. This could be acceptable if the returned AST is a copy, but given that they are not a copy, it means that GetDependencyChainForRules have the side-effect of modifying the input ruleset. I wouldn't expect such a side effect from this function.

Also, setting those rules as private is a very specific use case and most users of this function won't ever need it. You can always iterate the rules in the resulting ast.RuleSet and change them to private if you want. It's better to keep the API simple and straight to the point.

return final, nil
}

// OrderRules will take in a ruleset and ensure the order of the rules is correct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is not clear enough about what's the correct order. Someone with enough experience in YARA can infer that it means that rule "X" must appear before rule "Y" if "Y" depends on "X", but this should be clearly explained in the function. I would also name it SortRules, which is less ambiguous, because "order" is both a verb and noun, while "sort" is a verb.

However I think that this function shouldn't be necessary. It may be useful in situations in which you are modifying of building the ast.RuleSet yourself, but it shouldn't be a necessary step in the implementation of GetDependencyChainForRules, if you asume the input ast.RuleSet has the correct order already.

}

// addNodeChildrenToQue adds a nodes children to the queue
func addNodeChildrenToQue(queue *[]queueT) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at this function's signature or documentation is not clear what it does. The name implies that the children of some node are added the queue, but which node? By looking at the implementation is clear that its the children of the first node in the queue, but it requires you to read the function's code.

// Adds node's children to queue.
addChildrenToQueue(node ast.Node, queue *[]queueT)

return rs
}

// GetDependenciesForRules will find all the dependencies (rules & modules)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for this function could be improved. It's worth mentioning explicitly that the returned ruleset contains only the rules whose identifiers are included in ruleNames together with their direct and indirect dependencies, and the ruleset's imports will contain only the modules that are used by the returned rules.

"github.com/VirusTotal/gyp/ast"
)

var conditionKeywords = []string{"int8", "int16", "int32", "uint8", "uint16", "uint32", "int8be", "int16be", "int32be", "uint8be", "uint16be", "uint32be"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly for identifying built-in functions like intXX from other identifiers, so I think it's worth improving the parser by including more information about each identifier in the AST itself. The need for this array comes from a lack of features in the AST. Each ast.Identifier struct could contain not only the name, but also the kind of identifier if known. During the parsing we are able to know if an identifier is a rule identifier, or a built-in function, for example.

@gazunder
Copy link
Contributor Author
gazunder commented Jun 20, 2022

@plusvic thanks for the detailed feedback. I've addressed all of the comments you made in the commits I just added to this PR.

Some main changes include:

  1. ast.FunctionCall now has a builtin bool value. This allowed us to remove the conditionKeywords from dependency_walker.go Added test case for this too.
  2. GetDependencyChainForRules Now returns a copy of the ruleset, doesn't call SortRules, and doesn't mark rules as private
  3. Documentation improvements

Let me know if there are any other changes I should make or if I missed anything. Thanks!

}

// GetDependencyChainForRules will recursively find dependencies for a list of rules
// returnDepAsPrivate will dictate if dependent rules are marked as private
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation still mentions returnDepAsPrivate which is not an argument anymore.

// GetDependencyChainForRules will recursively find dependencies for a list of rules
// returnDepAsPrivate will dictate if dependent rules are marked as private
func GetDependencyChainForRules(ruleset ast.RuleSet, ruleNames ...string) (ast.RuleSet, error) {
queue := make(map[string]struct{}) // List of rules to get dependencies for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using a map as a queue? It is because you don't want to insert duplicate items in the queue? In that case I would rename queue to something like pendingRules, because the name queue is a bit misleading as this is not a queue actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because you don't want to insert duplicate items in the queue?

Yes I don't want to insert duplicate items - I'll rename this to pendingRules.

// GetRuleIdentifiers will find all the identifiers (excluding ForLoop
// variables and Builtin FuncCalls) and the number of times each identifier is
// seen for a given YARA rule
func GetRuleIdentifiers(rule ast.Rule) map[string]int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetRuleIdentifiers is a confusing name, it seems to suggest that the function is about the identifiers for the rules themselves, but it's about identifiers used by a rule. So, something like GetUsedIdentifiers is more descriptive.


// GetDependencyChainForRules will recursively find dependencies for a list of rules
// returnDepAsPrivate will dictate if dependent rules are marked as private
func GetDependencyChainForRules(ruleset ast.RuleSet, ruleNames ...string) (ast.RuleSet, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about naming this function GetRulesSubset? I would say that the main purpose of this function is extracting a subset of rules from a ast.RuleSet, of course this implies including all the dependencies in the returned ast.RuleSet, but still the main of this function is returning a subset of rules.

return results, nil
}

// SortRules will take in a ruleset and ensure the order of the rules is correct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss a more detailed explanation of what does a "correct" order means. I guess that it means that it guarantees that if rule A depends on rule B, the latter will appear first, but what about rules that don't depend on each other? What's the criteria followed for sorting a ruleset where all rules are completely independent? Are they sorted alphabetically? Do they remain in the same order as in the input ast.RuleSet? These are questions that come to my mind when I see this function, and they should be addressed in the documentation.

@plusvic plusvic merged commit 75d9d1f into VirusTotal:master Jul 21, 2022
@gazunder gazunder deleted the dependency-walker branch July 23, 2022 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants