-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Actions-Runner-Controller support for Gitea Actions #29567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
k8s hooks are technically (means there is no documentation, the docker compose examples use dind + docker hooks) already usable with Gitea Actions see this third party runner adapter https://gitea.com/gitea/awesome-gitea/pulls/149 Actions-Runner-Controller would require emulation of a bigger set of internal githib actions api I actually find this interesting to reverse engineer that product too, but I never dealt with k8s myself.
|
Interesting. I wasn't aware you could change the runner implementation just like that. Def will look into it. However given what you said about DinD still being a requirement I don't think it will change much (we already have our runners on K8s with DinD using a adopted version of gitea/act-runner for k8s but as mentioned, this comes with many headaches). The goal IMHO would be to be able to start workflows on k8s directly. Possible implementations:
Option one (every job is it's own pod) seems like the most promissing option in my opinion. |
I meant, I didn't create any k8s mode examples / actually tried it yet. Sorry for confusion here. The docker container hooks only allow dind for k8s. While the k8s hooks should use kubernetes api for container management, I still need to look into creating a test setup running. I can imagine
Well not using act_runner has limitations when you try to use Gitea Actions Extensions (using features not present in GitHub Actions) I think option 1 is more likly to happen than option 2. Job scheduling is based on jobs not on workflows. |
k8shooks works for me using these files on minikube (arm64) actions-runner-k8s-gitea-sample-files.zip
With clever sharing of the runner credentials volume, you could start a lot of replicas for more parallel runners This works without dind Test workflow on: push
jobs:
_:
runs-on: k8s # <-- Used runner label
container: ubuntu:latest # <-- Required, maybe the Gitea Actions adapter could insert a default
steps:
# Git is needed for actions/checkout to work for Gitea, rest api is not compatible
- run: apt update && apt install -y git
- uses: https://github.com/actions/checkout@v3 # <-- The almost only Gitea Extension supported
- run: ls -la
- run: ls -la .github/workflows The runner-pod-workflow is the job container pod, running directly via k8s. |
Looks promising. I'll give it a shot and share my findings. |
Okay, so... there seems to be some issues with the current setup. Let me share my findings:
- name: GITEA_RUNNER_REGISTRATION_TOKEN
valueFrom:
secretKeyRef:
name: secret_name
key: secret_key and creating your secret with (take care: K8s is case sensitive): apiVersion: v1
kind: Secret
metadata:
name: secret_name
type: Opaque
stringData:
secret_key: "s3cr3t" You shouldn't start pods in K8s directly but rather wrap them into a higher level resource such as a deployment which will make it benefit from the (deployment) controller logic when updating or self-healing the pod. I did that so the result looks something like this: apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: runner
name: runner
spec:
replicas: 1
selector:
matchLabels:
app: runner
template:
metadata:
labels:
app: runner
spec:
strategy:
type: Recreate
restartPolicy: Always
serviceAccountName: ci-builder
#securityContext:
# runAsNonRoot: true
# runAsUser: 1000
# runAsGroup: 1000
# seccompProfile:
# type: RuntimeDefault
volumes:
- name: workspace
emptyDir:
sizeLimit: 5Gi
containers:
- name: runner
image: ghcr.io/christopherhx/gitea-actions-runner:v0.0.11
#securityContext:
# readOnlyRootFilesystem: true
# allowPrivilegeEscalation: false
# capabilities:
# drop:
# - ALL
volumeMounts:
- mountPath: /home/runner/_work
name: workspace
env:
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: GITEA_INSTANCE_URL
value: https://foo.bar
- name: GITEA_RUNNER_REGISTRATION_TOKEN
valueFrom:
secretKeyRef:
name: gitea
key: token
- name: GITEA_RUNNER_LABELS
value: k8s
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 1000m
memory: 8Gi Few changes I made here:
So, those are simply improvement suggestions for the future. For now as you can see I've been trying to keep it as simple as possible, but I still run into a issue. The runner starts and registers, but when using the job you provided I run into the following error returned by the job:
/Edit so the root cause seems to be somewhere here: https://github.com/actions/runner/blob/v2.314.0/src/Runner.Worker/Program.cs#L20 In addition I found that providing a runner config by mounting one and setting the |
I didn't got this this kind of error before (at least for a year)
Sounds like the message inside the container got trimmed before it reached the actions/runner. Based on the error the begin was sent to the actions/runner successfully Maybe some data specfic to your test setup might cause this. (even parts not in the repo are stored in the message) I would need to add more debug logging to diagnose this |
If you add the logging I can reproduce the issue if you like. My guess is that's it's maybe proxy related. But can't tell from the error logs. |
@omniproc you made changes via the deployment file that are not compatible with actions/runner k8s container hooks and I have no idea if using a deployment is possible.
the workspace cannot be an empty dir volume, like in my example files it is required to be a persistentvolumeclaim You can technically change the name of the pvc via
This led mkdir Would require an empty dir mount - mountPath: /data
name: data Maybe if I create that dir in the Dockerfile it would work without that as long your fs is read write The nightly doesn't have sudo anymore in the start.sh file, but it can still certainly break existing non k8s setups as of now.
I found a mistake in the python wrapper file, probably due to resource constaints to RAM has os.read read less than expected and shorten the message. I also added some asserts about return values of pipe communication + env Please try to use that nightly image it should get you to the point that you omited the persistentvolumeclaims of my example and kubernetes cannot start the job pod (also make shure to create an empty dir mount at /data/) |
I'm now able to start the runner in k8s namespace with DinD mode. How can I scale up the runners by setting replica=2 or 3? |
@ChristopherHX Hi, an interesting project there! Just a little advice here:
I used StatefulSet and its volumeClaimTemplates functionality to dynamically provision PVCs and get its PVC names into the container as env var. Like the following: volumeClaimTemplates:
- metadata:
name: work
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi and refer as env:
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_CLAIM_NAME
value: work-$(ACTIONS_RUNNER_POD_NAME) A full working example that I tested is also available from https://github.com/traPtitech/manifest/blob/3ff7e8e6dfa3e0e4fed9a9e8ca1ad09f9b132ff1/gitea/act-runner/gitea-act-runner.yaml. |
Thanks for your example, it makes manual scaling pretty straightforward and works in minikube for testing purposes even with 4 replicas. The first time I read your response I thought |
is it possible that the runner doesn't yet support no-proxy? With the http_proxy / https_proxy and no_proxy env vars set, I see the runner using the proxy:
but it doesn't mention the no_proxy setting and later on errors when trying to connect to itself using it's pod IP (which is in the no_proxy list)
|
I didn't go myself through the limitations of actions/runner proxy support They seem to ignore ip exclusions
Not shure how my gitea runner can switch to hostnames, maybe try to reverse dns the ip and automatically add it to NO_PROXY? |
You can simply use something like this to add it to no-proxy:
But that won't work if they ignore IP addresses for no_proxy (as in fact, I tested it and it doesn't work). So, why does the runner try to contact itself via it's external interface anyway? Why not use Besides you could always use the DNS service built in k8s, but that would only work if that DNS name is used by the runner instead of the IP, see https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods |
I can send a single hostname/ip + port to the actions/runner If I send localhost
If my gitea runner adapter would be part of the Gitea backend, we would have a real hostname that forwards into nested containers, like the ARC has it |
well, then let's use k8s DNS service. That would work.
Can you point me to where this is done? Can this be configured? |
Here I set the connection address, but there is some url entries, maybe one of them is still pointing to the gitea instance without redirection |
So, if I understand that correct nektos artifactcache allows you to set Because So, what do you think about making the IP configurable via an environment variable? |
Is it currently viable to run gitea actions on k8s or is this still very much a work in progress? |
Yes I agree mostly about this. However I wouldn't put CACHE into the env. More something like GITEA_ACTIONS_RUNNER_HOST, because a fake actions runtime is also implemented in the runner (used more than one port for tcp listeners). Eventually if unset prefer hostnames over ips, but that needs testing on my side. (probably behind a feature env var) I would queue this tomorrow into my todo list, working on multiple projects... |
@omniproc Proxy should now work in tag v0.0.13 Use the following env - name: GITEA_ACTIONS_RUNNER_RUNTIME_USE_DNS_NAME
value: '1'
- name: GITEA_ACTIONS_RUNNER_RUNTIME_APPEND_NO_PROXY # this appends the dns name of the pod to no_proxy
value: '1'
- name: http_proxy
value: http://localhost:2939 # some random proxy address for testing, use the real one
- name: no_proxy
value: .fritz.box,10.96.0.1 # first exclusion for gitea, second for kubernetes adjust as needed |
This pretty much depends on your requirements, so more would try to use it so more issues can be found & fixed
|
@ChristopherHX testing v0.0.13... getting closer... now TLS errors. I'm not sure from the log output if this happens due to the mentioned RFC 6066 issue (I was under the impression that now DNS names will be used so not sure why this is logged anyway) or because the CA of the proxy is missing. I'll try to mount the CA to the runner and see what happens. First have to find out what location it's looking up for trusted CAs.
|
@omniproc nodejs ignores ca certs from common locations on linux and does it's own thing, point env NODE_EXTRA_CA_CERTS to your certbundle file including your kubernetes api cert chain that cert bundle needs to be mounted to the runner container. I assume this undescriptive very short error comes from kubernetes api access via https from node
For the dind backend I wrote an provisions script for my self signed certs for all containers run by actions/runner, I could look into creating containers using modified k8s hooks for cert provisioning. By default every container you use is assumed to have env NODE_EXTRA_CA_CERTS set and the ca folders populated if you use selfsigned certs, not really practicable... EDIT Is your kubernetes api accessed by your proxy? |
@ChristopherHX I can confirm it's working. It was two issues (as you expected):
A few UX improvement suggestions here from my side. As a user when I configure no_proxy I usually only have the URLs in my mind that i know should not be proxied but have to be reached by the runner. I know them because I usually configure them explicitly in my pipeline (e.g. Git repo). What I don't know is what other stuff the runner has to reach. Of course, on second thought, it's obvious why Node tries to reach the K8s API. But since it's the runner who wants to reach it I think it should be the runner's responsibility to setup everything it can to make this happen. So my suggestion would be:
|
So now that the runner starts a new pod for the workflow I was trying to get DinD to work in it using catthehacker/ubuntu:act-22.04 as the job container image, which doesn't work since the docker socket is not available. I know that in theory it's possible because gitea/act_runner:nightly-dind-rootless can run DinD but that image is of course missing all the Act components. So before I start fiddling around building a hybrid of catthehacker and dind-rootless: how did you get DinD to run @ChristopherHX ? |
@omniproc I might have caused confusion here, I have not set up dind in the job container yet. Did this only for the runner (that is by default a docker cli client) outside of kubernetes I would expect using a custom fork of https://github.com/actions/runner-container-hooks could configure a dind installation installed on the external tools (the folder that has node20 etc. for the runner) on any job container
This is similar like I did it e.g. in docker compose (docker hook mode), https://github.com/ChristopherHX/gitea-actions-runner/blob/main/examples/docker-compose-dind-rootless/docker-compose.yml This only works if you don't use the k8s container hooks, but I'm not shure if the docker.sock bind mount works in that setup as I didn't make use of it This approuch has flaws if you try to run the following you get strange bugs
|
@ChristopherHX so I got a working prototype of this. Instead of using DinD, which arguably is a security nightmare more often then not (or comes with many limitations as of today when running unprivileged), I switched to buildkit, which doesn't require any privileges and can be executed in a non-root pod. So the process currently looks like so:
|
@ChristopherHX is it possible that currently we can not pass environment variables using the
In this case |
No Idea, I tried the following and it passes my test. (both ways to do that in a container job) name: Gitea Actions Demo
on: [push]
jobs:
build-docker:
runs-on: trap-cluster
container:
image: buildpack-deps:noble
env:
MY_IMAGE_VAR: foo
env:
MY_GLOBAL_VAR: foo
steps:
- name: Checkout
run: |
echo "MY_GLOBAL_VAR: $MY_GLOBAL_VAR"
echo "MY_IMAGE_VAR: $MY_IMAGE_VAR"
MY_GLOBAL_VAR is expected to be echoed for every step like in my log while MY_IMAGE_VAR is absent in |
@ChristopherHX you're right. I expected MY_IMAGE_VAR to be available in the env scope. It is not. However it is available in the steps when running a shell. Also it is visible in the env of the pod definition. The MY_GLOBAL_VAR on the other hand is available in the env scope within the action and can be access from the shell but is not visible in the pod spec. Interesting. I didn't know that difference between those two envs before. Thanks for clarification. |
from my experience with different tools that work with kuberentes:
|
I've been following this for some time now as I'd really love to switch over to Gitea actions and move away from Jenkins as a CI/CD tool, the main big thing that is preventing from creating a case for it is exactly this, a way to dynamically create temporary pods that only live for as long as the 'Jenkins job/Gitea workflow' so that resourcing can be controlled in a native Kubernetes way. I agree with everything @querplis has said above, and can state that the Jenkins Kubernetes plugin has the same functionality.
|
This already works, as documented in this GH issue.
It's not needed but it's good practices to separate concerns ( and argueably this design also leads to single responsibility and possibly open-closed). The job of the controller is to talk to and monitor the K8s API and based on that do what it must on the backend system, and/or - vice versa - wait for instructions from the external system (Gitea) to get instructions and perform the required tasks in K8s.
True, simple pod with multiple containers might do for most cases. More complex scenarios however might require pre or post steps. Take a look at the ecosystems of FluxCD and ArgoCD and how they involved from basicly what you argue for to something much bigger and more complex. But I agree that for an initial implementation having granular control over a pod - started as k8s job or simple pod - is good enough. However at that point, since it's just as easy as applying a manifest against the K8s API, why limit it? Just leave it up to the user what he wants to define in the manifest and have it applied as the Gitea action workflow starts. A simple label system could signal Gitea what pod to consider important for the workflow to fail (e.g. apply a label gitea-action-must-succeed to all pods that Gitea will consider relevant for the workflow to succeed).
Same here. Simply allow the user to supply a K8s manifest with the workflow and the controller would apply it to k8s.
Secrets in Kubernetes are not designed to be "secret". They are simply configmaps that can be protected using RBAC. Within a namespace, all pods can access them, just like any configmap. Don't want that? Create a separate namespace, use RBAC. Don't try to re-invent the wheel. It will only break 90% of k8s tools you might need since they pretty much all expect you to use secrets the way they're ment to be used. |
Hi @omniproc , @ChristopherHX , I've set up my runner according to all of the examples you've provided here. However, I've come to a standstill, where I cannot clone a repository. My runner is based on @ChristopherHX 's image: I've tried with every container in my workflow: At this step it always fails
With this error: How did you fix it? We're there any additional environmental variables added to the workflow or the runner itself? Thanks for everything you guys have done so far by the way! |
Seems like you either did not install node or your nodejs env var is pointing to an empty dir. The image will not just have any binary you need. Either you build your own image that bundles nodejs or you use one of the many install nodejs actions available as a pre step in your workflow. I don't have any node dependencies so I never tested node builds. |
Yeah my problem is that I don't have anything that uses node. I'm just trying to check out my repository. Basically with your example I cannot check out my repository. This is my workflow
|
You are using the github checkout action, which is a javascript action executed by nodejs. Can't tell why it is failing from the little information you provided. |
@djeinstine Do
Yes exactly. Something like parts of your kubernetes config could be helpful. Node normally doesn't need to be installed, this node folder should be copied to the persisted volume claim by https://github.com/actions/runner-container-hooks k8s edition during the Setup Job Step. |
Yes I didn't post my config. I took inspiration from this post, and the full working example from @motoki317 here https://github.com/traPtitech/manifest/blob/3ff7e8e6dfa3e0e4fed9a9e8ca1ad09f9b132ff1/gitea/act-runner/gitea-act-runner.yaml and came up with the following: relevant part of gitea-act-runner.yaml apiVersion: apps/v1
kind: StatefulSet
metadata:
name: gitea-act-runner
spec:
serviceName: gitea-act-runner
replicas: 1
revisionHistoryLimit: 0
volumeClaimTemplates:
- metadata:
name: work
spec:
accessModes:
- ReadWriteOnce
storageClassName: "local-path"
resources:
requests:
storage: 1Gi
persistentVolumeClaimRetentionPolicy:
whenScaled: Delete
whenDeleted: Delete
selector:
matchLabels:
app: gitea-act-runner
template:
metadata:
labels:
app: gitea-act-runner
spec:
serviceAccountName: gitea-act-runner
containers:
- name: runner
image: ghcr.io/christopherhx/gitea-actions-runner:v0.0.13
imagePullPolicy: Always
env:
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_CLAIM_NAME
value: work-$(ACTIONS_RUNNER_POD_NAME)
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: GITEA_INSTANCE_URL
value: https://gitea.nas.homespace.ovh/
- name: GITEA_RUNNER_REGISTRATION_TOKEN
valueFrom:
secretKeyRef:
name: act-runner
key: registration-token
- name: GITEA_RUNNER_LABELS
value: beowulf-cluster
- name: GITEA_RUNNER_NAME
value: beowulf-act-runner
volumeMounts:
- mountPath: /home/runner/_work
name: work
resources:
requests:
cpu: "100m"
memory: "500Mi"
limits:
cpu: "1"
memory: "2Gi" demo workflow demo.yaml name: Gitea Actions Demo
run-name: Testing out Gitea Actions 🚀
on: [push]
jobs:
Explore-Gitea-Actions:
runs-on: beowulf-cluster
container:
image: ghcr.io/catthehacker/ubuntu:act-22.04 #ghcr.io/omniproc/act-buildkit-runner:latest #
steps:
- name: Check out repository code.
uses: https://github.com/actions/checkout@v4 |
Something is odd in your kubernetes cluster..., maybe try a different storage provider or change size limits? I have no clue why the external files are not there, for me they are always copied back if I delete it manually. The externals folder is intact in the image as well, otherwise the k8s hooks couldn't run as well.
This difference is not an issue for me, works with both default and this one. Had to enable this provider in my minikube.
Please add a run step before checkout that checks that the /__e contains the node tool by recusively enumerating the folder. Maybe add a e.g. if I look at my kubernetes, check that the externals folder has the node program inside of it and you can execute it.
The k8s container hooks you are using here are unchanged ones from https://github.com/actions/runner-container-hooks/releases/tag/v0.6.1 using unchanged actions/runner 2.317.0
This is the function responsible to provide the externals that are not found: https://github.com/actions/runner-container-hooks/blob/73655d4639a62f6e4b3d70b5878bc4367c0a436e/packages/k8s/src/hooks/prepare-job.ts#L184-L193 |
@ChristopherHX Node/Storage Side Results: I can see all of the relevant files listed in your response. The only difference being is the folder you have as
Runner/Pod Side Results:
So I did an ls at two folders above the entry point ( So it seems to me the hooks are perfectly fine, but the mounts are not working properly. |
My example is pretty much the same as yours, but our cluster are the most different factor here. I'm using minikube on my arm64 based server. example.yml (This is an export of my minikube with local-path povider as you have shown in your snipped, some generated fields have been removed again) The
What I didn't understand, does sub path mounts work on your cluster if you create a pod yourself with a subpath mount of a volume like done by the k8s hooks? For me it looks like a k8s runner container hooks bug or missing functionality in your kubernetes cluster. I read a while back that older versions of dockerd didn't support mounting a sub folder within a volume to a container, but idk if that is correct. I assume modifying k8s hooks (link to the code in earlier comments) so they don't mount / |
@ChristopherHX |
You mentioned you use Talos, which comes with some special requirements for local path provisioner. Maybe that was the issue? https://www.talos.dev/v1.7/kubernetes-guides/configuration/local-storage/ |
I just looked at my extra mounts section and I didn't mount the hostPath mounts. Looks like I skipped that section of the docs. |
starting from direct access and then moving to controller as/if neeeded , is a way to faster get there, since controller will add extra complexity.
its not fluxcd and argocd that we should look in this case, since they do completely different things, but jenkns, gitlab and drone
exactly!
what i was trying to say that if there is an option to not use k8s secrets for storing job secrets, but instead inject them from somewhere else, directly into job, then that might be a better option, which lets people drastically reduce amount of namespaces they need just to isolate, in some cases, single secret value. |
I'd argue if you end up with 1 namespace per secret you either have only one secret per concern or your architecture should be refactored. "injecting them from somewhere else" however is always easy if the user has full access to the manifests that should be applied. I personally view the sideloading of secrets as an anti-pattern, but you might have a different opinion on the matter. |
Hi! Thank you to everyone here for their awesome work on this. I just got a functional version up and running in my homelab. I'm curious about the status on this project and if there is any intention to continue iterating on the great work that y'all have done so far. Specifically, I'm curious about the following:
I'd like to be able to define resource requests/limits for the workflow Pods that are being created. Would this be considered in-scope and an iteration of what y'all have built already? Are there plans to fully convert GitHub's Action Runner Controller to be compatible with Gitea? Is there anything I can do to help? |
Sorry for being silent for some time, I am now back on track to the next iteration. I have a k8s autoscaler proof of concept running on my raspberry pi4: Read More here https://gitea.com/gitea/helm-actions/issues/8 We can discuss about this in this issue as well if you want, if you are interested to be an early tester feel free to reach out to me. The test setup is a bit experimental. I have worked on these two gitea features last February and now they are merged in gitea 1.24 nightly, everything else can be worked around in the autoscaler component.
I think this will be possible with the above new approach. I didn't comment earlier, because additions of gitea were not merged and k8s runner had not been tested at that point of time.
Not on my scope at the moment, but garm with the k8s extension should be able to fill most of the gaps you are currently dealing with. |
Just confirming that this is being worked on here: It might take a while to get it merged, as I need to shift to something else for a while, but will allocate time as it becomes available. At this point, spinning up runners for repos seems to work and will add orgs soon. The current state is that it still needs a lot of tests before it's ready. If anyone wants to test the existing WiP branch, feel free to do so. All existing providers should work as long as they allow setting the runner install template (as detailed in the PR). For the k8s provider I think you can create a custom image with a gitea compatible entrypoint that sets up the runner. |
Okay. Initial Gitea support is now in the https://github.com/cloudbase/garm/blob/main/doc/gitea.md Right now it supports runners at the repository and organization levels. We can later add Edit: Thanks @ChristopherHX for working on all the bits and pieces that were needed in gitea and act_runner in order to enable this work to be done in GARM! |
Note, in the doc above LXD is used as a provider (supports both system containers and VMs), but you can also use the k8s provider. Or the AWS/CGP/Azure/OpenStack providers. Or you can write your own provider (the interface is simple) for anything that doesn't already exist (like vmware or proxmox). You can create pools using any one of the configured providers, associate a priority to pools and stack the higher priority ones first, before moving to the next, or just round-robin between pools with similar labels. |
Feature Description
The Gitea Actions release was a great first step. But currently it's missing many features of a more mature solution based on K8s runners rather then single nodes. While it's possible to have runners on K8s this currently requires DinD which has it's hole set of own problems, security issues (privileged exec required as of today) and feature limitations (can't use DinD to start another container to build a container image (DinDinD)). I know with buildx workarounds exist, but those are just that: workarounds.
I think the next step could be something like what actions-runner-controller is doing for GitHub actions. Basically a operator that is deployed on K8s and registers as runner. Every job it starts is then started in it's own pod rather then the runner itself. The runner coordinates the pods.
Related docs:
Screenshots
No response
The text was updated successfully, but these errors were encountered: