-
Notifications
You must be signed in to change notification settings - Fork 179
Description
Is your feature request related to a problem? Please describe.
As secureCodeBox user i am heavily using the git-repo-scanner in combination with the gitleaks scanner via cascadingRules. When scanning large Gitlab oder GitHub organisations with thousands of repos on a regular basis (daily, weekly) this leeds to thousands of independent gitleaks scans.
To save resources on the gitleaks side we extended gitleaks to be able to analyse all commits since a given timeframe (last 24h, last week,...): gitleaks/gitleaks#498
Problem is that the git-repo-scanner always returns all git repositories, even if they had no activity in the last timeframe. To safe even more resources it would be great to be able to configure an activity timeframe and use it as filter.
Describe the solution you'd like
Add the following configuration options to the git-repo-scanner:
--activity-since-duration= Return git repo findings with repo activity (e.g. commits) more recent than a specific date expresed by an duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each
with optional fraction and a unit suffix, such as '300ms', '-1.5h' or '2h45m'. Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'.
--activity-until-duration= Return git repo findings with repo activity (e.g. commits) older than a specific date expresed by an duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with
optional fraction and a unit suffix, such as '300ms', '-1.5h' or '2h45m'. Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'.
Remark 🚧 : The concrete semantic of the timeframe definition can depend on the implementation details and might be slightly different. It is not important to implement it in exact that semantic.
Implementation Hints:
- The GitLab Python client has some helpful methods like sorting all projects by
last_activity_at: https://python-gitlab.readthedocs.io/en/stable/gl_objects/projects.html#examplesgl.projects.list(all=True, include_subgroups=True, order_by='last_activity_at', sort='desc')- could be used at:
projects = gl.groups.get(args.group).projects.list(all=True, include_subgroups=True)
- The Github Python client has some helpful methods like sorting all projects by last update https://pygithub.readthedocs.io/en/latest/github_objects/Organization.html#github.Organization.Organization.get_repos
get_repos(type='all', sort='updated', direction='desc')- could be used at:
repos: PaginatedList[Repository] = org.get_repos(type='all')
Example ScheduledScan Configuration
The following scheduled scan example will start a new git-repo-scanner scan every 24h and then cascades all the results with follow up gitleaks scans. The gitleaksscanner then analyses all the commits done within the last 24h timeframe.
apiVersion: "execution.securecodebox.io/v1"
kind: ScheduledScan
metadata:
name: "scb-github-repos"
labels:
product: "secureCodeBox"
spec:
interval: 24h
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
scanSpec:
scanType: "git-repo-scanner"
parameters:
# configuration of the target git system
- "--git-type"
- "github"
- "--organization"
- "secureCodeBox"
# Provide an access token from ENV Vars defined via secret
- "--access-token"
- "$(GITHUB_TOKEN)"
# Filter findings - only return git repos with activity (commits) within a given timeframe
- "--activity-since-duration"
- "24h"
env:
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: github-access-token
key: token
cascades:
matchLabels:
securecodebox.io/intensive: medium
securecodebox.io/invasive: non-invasiveapiVersion: "cascading.securecodebox.io/v1"
kind: CascadingRule
metadata:
name: "gitleaks-github-scan-public"
labels:
securecodebox.io/invasive: non-invasive
securecodebox.io/intensive: medium
spec:
matches:
anyOf:
- name: "GitHub Repo"
attributes:
visibility: public
scanSpec:
scanType: "gitleaks"
parameters:
- "--repo-url"
- "{{{attributes.web_url}}}"
# Apply all available rules
- "--config-path"
- "/home/config_all.toml"
# Redact secrets from log messages and leaks
- "--redact"
# Only scan commits since the last 24h
- "--commit-since-duration"
- "24h"
# Provide an access token from ENV Vars defined via secret
- "--access-token"
- "$(GITHUB_TOKEN)"
env:
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: github-access-token
key: tokenDescribe alternatives you've considered
none
Additional context
none