8000 ⚙️ Extend the git-repo-scanner with an activity filter · Issue #320 · secureCodeBox/secureCodeBox · GitHub
[go: up one dir, main page]

Skip to content

⚙️ Extend the git-repo-scanner with an activity filter #320

@rfelber

Description

@rfelber

Is your feature request related to a problem? Please describe.

As secureCodeBox user i am heavily using the git-repo-scanner in combination with the gitleaks scanner via cascadingRules. When scanning large Gitlab oder GitHub organisations with thousands of repos on a regular basis (daily, weekly) this leeds to thousands of independent gitleaks scans.

To save resources on the gitleaks side we extended gitleaks to be able to analyse all commits since a given timeframe (last 24h, last week,...): gitleaks/gitleaks#498

Problem is that the git-repo-scanner always returns all git repositories, even if they had no activity in the last timeframe. To safe even more resources it would be great to be able to configure an activity timeframe and use it as filter.

Describe the solution you'd like

Add the following configuration options to the git-repo-scanner:

--activity-since-duration= Return git repo findings with repo activity (e.g. commits) more recent than a specific date expresed by an duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each
                               with optional fraction and a unit suffix, such as '300ms', '-1.5h' or '2h45m'. Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'.
--activity-until-duration= Return git repo findings with repo activity (e.g. commits) older than a specific date expresed by an duration (now + duration). A duration string is a possibly signed sequence of decimal numbers, each with
                               optional fraction and a unit suffix, such as '300ms', '-1.5h' or '2h45m'. Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'.

Remark 🚧 : The concrete semantic of the timeframe definition can depend on the implementation details and might be slightly different. It is not important to implement it in exact that semantic.

Implementation Hints:

Example ScheduledScan Configuration

The following scheduled scan example will start a new git-repo-scanner scan every 24h and then cascades all the results with follow up gitleaks scans. The gitleaksscanner then analyses all the commits done within the last 24h timeframe.

apiVersion: "execution.securecodebox.io/v1"
kind: ScheduledScan
metadata:
  name: "scb-github-repos"
  labels:
    product: "secureCodeBox"
spec:
  interval: 24h
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  scanSpec:
    scanType: "git-repo-scanner"
    parameters:
      # configuration of the target git system
      - "--git-type"
      - "github"
      - "--organization"
      - "secureCodeBox"
      # Provide an access token from ENV Vars defined via secret
      - "--access-token"
      - "$(GITHUB_TOKEN)"
      # Filter findings - only return git repos with activity (commits) within a given timeframe 
      - "--activity-since-duration"
      - "24h"
    env:
      - name: GITHUB_TOKEN
        valueFrom:
          secretKeyRef:
            name: github-access-token
            key: token
    cascades:
      matchLabels:
        securecodebox.io/intensive: medium
        securecodebox.io/invasive: non-invasive
apiVersion: "cascading.securecodebox.io/v1"
kind: CascadingRule
metadata:
  name: "gitleaks-github-scan-public"
  labels:
    securecodebox.io/invasive: non-invasive
    securecodebox.io/intensive: medium
spec:
  matches:
    anyOf:
      - name: "GitHub Repo"
        attributes:
          visibility: public
  scanSpec:
    scanType: "gitleaks"
    parameters:
      - "--repo-url"
      - "{{{attributes.web_url}}}"
      # Apply all available rules
      - "--config-path"
      - "/home/config_all.toml"
      # Redact secrets from log messages and leaks
      - "--redact"
      # Only scan commits since the last 24h
      - "--commit-since-duration"
      - "24h"
      # Provide an access token from ENV Vars defined via secret
      - "--access-token"
      - "$(GITHUB_TOKEN)"
    env:
      - name: GITHUB_TOKEN
        valueFrom:
          secretKeyRef:
            name: github-access-token
            key: token

Describe alternatives you've considered

none

Additional context

none

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestpythonIssues based on python implementationsscannerImplement or update a security scanner

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0