8000 Added new SAST scanner `semgrep` by malexmave · Pull Request #744 · secureCodeBox/secureCodeBox · GitHub
[go: up one dir, main page]

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d7d9ce4
Add early WIP for semgrep scanner
malexmave Oct 12, 2021
b0a34d6
First version of parser + unit tests
malexmave Oct 13, 2021
ddaac78
Fix parser and tests
malexmave Oct 14, 2021
23b6f50
Update values.yaml to SCBv3 syntax
malexmave Oct 14, 2021
3ae30a4
Add support for initContainers to test framework
malexmave Oct 15, 2021
bdb3fd1
Add semgrep integration tests
malexmave Oct 15, 2021
297825b
Add semgrep tests to CI
malexmave Oct 15, 2021
1ba9e06
Add example with findings
malexmave Oct 15, 2021
61a7f28
Upgrade semgrep to 0.69.1
malexmave Oct 18, 2021
af1acd8
Add README for semgrep
malexmave Oct 18, 2021
b819bdb
Remove matched lines from output
malexmave Oct 19, 2021
e26f898
Add cascadingRules to docs
malexmave Oct 19, 2021
af347f1
Update integration tests to use local file
malexmave Oct 19, 2021
b18d5a6
WIP: semgrep support for DefectDojo hook
malexmave Oct 19, 2021
fbe01be
Add unnecessary files to helmignore
malexmave Oct 20, 2021
3e35a9e
Create folder for docs
malexmave Oct 21, 2021
c413f09
Rename example file to make it findable
malexmave Oct 21, 2021
787043e
Update documentation metadata
malexmave Oct 21, 2021
636dea0
Fix typo :(
malexmave Oct 21, 2021
0d1f3cd
Update semgrep to 0.70.0
malexmave Oct 21, 2021
9648126
Add semgrep to DD-supported scan types
malexmave Oct 21, 2021
7f1af9e
Updating Helm Docs
malexmave Oct 21, 2021
da66544
Fix templating in helm-docs
malexmave Oct 22, 2021
85d44ff
Add generated documentation
malexmave Oct 22, 2021
364e661
Updating Helm Docs
malexmave Oct 22, 2021
fd16358
Add scb-bot support for semgrep
malexmave Oct 22, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ jobs:
- nmap
- nuclei
- screenshooter
- semgrep
- ssh-scan
- sslyze
- trivy
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/scb-bot.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ jobs:
# - wpscan
# - zap
# - zap-advanced
# - semgrep
# These are commented out for the moment to avoid accidental multiple erroneous PRs
# missing scanners are : nmap, nikto, typo3scan
steps:
Expand Down
1 change: 1 addition & 0 deletions hooks/persistence-defectdojo/.helm-docs.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ These are:
- SSLyze
- Trivy
- Gitleaks
- Semgrep

After uploading the results to DefectDojo, it will use the findings parsed by DefectDojo to overwrite the
original secureCodeBox findings identified by the parser. This lets you access the finding metadata like the false
Expand Down
1 change: 1 addition & 0 deletions hooks/persistence-defectdojo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ These are:
- SSLyze
- Trivy
- Gitleaks
- Semgrep

After uploading the results to DefectDojo, it will use the findings parsed by DefectDojo to overwrite the
original secureCodeBox findings identified by the parser. This lets you access the finding metadata like the false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ public enum ScanNameMapping {
NIKTO("nikto", ScanType.NIKTO_SCAN),
NUCLEI("nuclei", ScanType.NUCLEI_SCAN),
WPSCAN("wpscan", ScanType.WPSCAN),
SEMGREP("semgrep", ScanType.SEMGREP_JSON_REPORT),
GENERIC(null, ScanType.GENERIC_FINDINGS_IMPORT)
;

Expand Down
158 changes: 158 additions & 0 deletions scanners/semgrep/.helm-docs.gotmpl
volumeMounts:
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
{{- /*
SPDX-FileCopyrightText: 2021 iteratec GmbH

SPDX-License-Identifier: Apache-2.0
*/ -}}

{{- define "extra.docsSection" -}}
---
title: "Semgrep"
category: "scanner"
type: "Repository"
state: "released"
appVersion: "{{ template "chart.appVersion" . }}"
usecase: "Static Code Analysis"
---

![Semgrep logo](https://raw.githubusercontent.com/returntocorp/semgrep-docs/main/static/img/semgrep-icon-text-horizontal.svg)

{{- end }}

{{- define "extra.dockerDeploymentSection" -}}
## Supported Tags
- `latest` (represents the latest stable release build)
- tagged releases, e.g. `{{ template "chart.appVersion" . }}`
{{- end }}

{{- define "extra.chartAboutSection" -}}
## What is Semgrep?
Semgrep ("semantic grep") is a static source code analyzer that can be used to search for specific patterns in code.
It allows you to either [write your own rules](https://semgrep.dev/learn), or use one of the [many pre-defined rulesets](https://semgrep.dev/r) curated by the semgrep team.

To learn more about semgrep, visit [semgrep.dev](https://semgrep.dev).

{{- end }}

{{- define "extra.scannerConfigurationSection" -}}
## Scanner Configuration

Semgrep requires one or more ruleset(s) to run its scans.
Refer to the [semgrep rule database](https://semgrep.dev/r) for more details.
A good starting point would be [p/ci](https://semgrep.dev/p/ci) (for security checks with a low false-positive rate) or [p/security-audit](https://semgrep.dev/p/security-audit) (for a more comprehensive security audit, which may include more false-positive results).


Semgrep needs access to the source code to run its analysis.
To use it with secureCodeBox, you thus need a way to provision the data into the scan container.
The recommended method is to use `initContainers` to clone a VCS repository.
The simplest example, using a public Git repository from GitHub, looks like this:

```yaml
apiVersion: "execution.securecodebox.io/v1"
kind: Scan
metadata:
name: "semgrep-vulnerable-flask-app"
spec:
# Specify a Kubernetes volume that will be shared between the scanner and the initContainer
volumes:
- name: repository
emptyDir: {}
# Mount the volume in the scan container
- mountPath: "/repo/"
name: repository
# Specify an init container to clone the repository
initContainers:
- name: "provision-git"
# Use an image that includes git
image: bitnami/git
# Mount the same volume we also use in the main container
volumeMounts:
- mountPath: "/repo/"
name: repository
# Specify the clone command and clone into the volume, mounted at /repo/
command:
- git
- clone
- "https://github.com/we45/Vulnerable-Flask-App"
- /repo/flask-app
# Parameterize the semgrep scan itself
scanType: "semgrep"
parameters:
- "-c"
- "p/ci"
- "/repo/flask-app"
```

If your repository requires authentication to clone, you will have to give the initContainer access to some method of authentication.
This could be a personal access token ([GitHub](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token), [GitLab](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html)), project access token ([GitLab](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html)), deploy key ([GitHub](https://docs.github.com/en/developers/overview/managing-deploy-keys#deploy-keys) / [GitLab](https://docs.gitlab.com/ee/user/project/deploy_keys/)), deploy token ([GitLab](https://docs.gitlab.com/ee/user/project/deploy_tokens/)), or a server-to-server token ([GitHub](https://docs.github.com/en/developers/overview/managing-deploy-keys#server-to-server-tokens)).
Due to the large variety of options, we do not provide documentation for all of them here.
Refer to the linked documentation for details on the different methods, and remember to use [Kubernetes secrets](https://kubernetes.io/docs/concepts/configuration/secret/) to manage keys and tokens.

## Cascading Rules
By default, the semgrep scanner does not install any [cascading rules](docs/hooks/cascading-scans), as some aspects of the semgrep scan (like the used ruleset) should be customized.
However, you can easily create your own cascading rule, for example to run semgrep on the output of [git-repo-scanner](docs/scanners/git-repo-scanner).
As a starting point, consider the following cascading rule to scan all public GitHub repositories found by git-repo-scanner using the p/ci ruleset of semgrep:

```yaml
apiVersion: "cascading.securecodebox.io/v1"
kind: CascadingRule
metadata:
name: "semgrep-public-github-repos"
labels:
securecodebox.io/invasive: non-invasive
securecodebox.io/intensive: medium
spec:
matches:
anyOf:
# We want to scan public GitHub repositories. Change "public" to "private" to scan private repos instead
- name: "GitHub Repo"
attributes:
visibility: public
scanSpec:
# Configure the scanSpec for semgrep
scanType: "semgrep"
parameters:
- "-c"
- "p/ci" # Change this to use a different rule set
- "/repo/"
volumes:
- name: repo
emptyDir: {}
volumeMounts:
- name: repo
mountPath: "/repo/"
initContainers:
- name: "git-clone"
image: bitnami/git
# The command assumes that GITHUB_TOKEN contains a GitHub access token with access to the repository.
# GITHUB_TOKEN is set below in the "env" section.
# If you do not wan to use an access token, remove it from the URL below.
command:
- git
- clone
- "https://$(GITHUB_TOKEN)@github.com/{{ "{{{" }}attributes.full_name{{ "}}}" }}"
- /repo/
volumeMounts:
- mountPath: "/repo/"
name: repo
# Load the GITHUB_TOKEN from the kubernetes secret with the name "github-access-token"
# Create this secret using, for example:
# echo -n 'YOUR TOKEN GOES HERE' > github-token.txt && kubectl create secret generic github-access-token --from-file=token=github-token.txt
# IMPORTANT: Ensure that github-token.txt does not have a new line at the end of the file. This is automatically done by using "echo -n" to create it.
# However, if you create it with an editor, some editors (most notably, vim) will create hidden newlines at the end of files, which will cause issues.
env:
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: github-access-token
key: token
```

Use this configuration as a baseline for your own rules.
{{- end }}

{{- define "extra.chartConfigurationSection" -}}
{{- end }}

{{- define "extra.scannerLinksSection" -}}
{{- end }}
40 changes: 40 additions & 0 deletions scanners/semgrep/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SPDX-FileCopyrightText: 2021 iteratec GmbH
#
# SPDX-License-Identifier: Apache-2.0
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
# Node.js files
node_modules/*
package.json
package-lock.json
src/*
config/*
Dockerfile
.dockerignore
*.tar
parser/*
scanner/*
integration-tests/*
examples/*
docs/*
Makefile
45 changes: 45 additions & 0 deletions scanners/semgrep/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: v2
name: semgrep
description: A Helm chart for the semgrep semantic code analyzer that integrates with the secureCodeBox

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: "v3.1.0-alpha1"

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.70.0"

versionApi: https://api.github.com/repos/returntocorp/semgrep/releases/latest

kubeVersion: ">=v1.11.0-0"

home: https://docs.securecodebox.io/docs/scanners/semgrep
icon: https://docs.securecodebox.io/img/integrationIcons/semgrep.svg # TODO: Add this

sources:
- https://github.com/secureCodeBox/secureCodeBox

maintainers:
- name: iteratec GmbH
- email: secureCodeBox@iteratec.com

keywords:
- security
- semgrep
- SAST
- staticanalysis
- secureCodeBox
15 changes: 15 additions & 0 deletions scanners/semgrep/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/make -f

include_guard = set # Always include this line (checked in the makefile framework)
scanner = semgrep

include ../../scanners.mk # Ensures that all the default makefile targets are included

integration-tests:
@echo ".: 🩺 Starting integration test in kind namespace 'integration-tests'."
kubectl -n integration-tests delete scans --all
cd ../../tests/integration/ && npm ci
cd ../../scanners/${scanner}
kubectl -n integration-tests create configmap semgrep-test-file --from-file=integration-tests/testfile.py
npx --yes --package jest@$(JEST_VERSION) jest --verbose --ci --colors --coverage --passWithNoTests ${scanner}/integration-tests
kubectl -n integration-tests delete configmap semgrep-test-file
Loading
0