-
Notifications
You must be signed in to change notification settings - Fork 6
Fix CI cache restore #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI cache restore #156
Conversation
0bdd769
to
04735fa
Compare
The old method also didn't reproduce the same environment if there was a cache hit (the sandbox was disabled in this case).
This should be more reliable than the old cache-install action because all the software is operated according to their intended interfaces.
2e66d9f
to
da11a13
Compare
Thank you so much for figuring this hairy bug out and also fixing the respective bug in the jekyll project! I appreciate leaving Given your convincing arguments for not having a cache (at least not for DiffDetective), I am in favor of reducing complexity and just going with change 1. Maybe leaving a short comment at the place where the caching would needs to be, which points to this PR would be nice. So this means we would merge your first commit but drop the second one? |
After testing (everything works as excepted) and measuring the cache, we noticed that the time improvement is not worth the additional complexity. Hence, we decided to remove this cache but leave it in the commit history in case it becomes relevant in the future. This effectively reverts commit da11a13 "CI: Cache dependencies between runs".
In case the Nix sandbox is enabled, the github-metadata plugin automatically disables network accesses. However, if the sandbox is disabled (e.g., on MacOS or by explicitly disabling it), the github-metadata plugin tries to access the GitHub API and (fortunately) fails which aborts the build. As the required patch is currently unreleased, the patch is directly fetched from GitHub. This overlay can be removed as soon as this patch lands in nixpkgs and we update to such a nixpkgs version.
Yes, that's exactly what it does.
I deciced to revert the commit instead of dropping it. This keeps this caching mechanism in the commit history for future reference. I also pushed a fix for problem 1. It applies the patch for |
sounds good to me :) |
I identified the feature interaction that caused sporadic CI failures:
share/github-pages/DiffDetective/
in thenix-build
output). However, GitHub Pages uses thegithub-metadata
Jekyll plugin which connects to the GitHub API to retrieve some metadata (which is uninteresting for us). This step is skipped in case there is no internet connection./etc/nix/nix.conf
is missing if the Nix installation is restored from the cache), the Nix sandbox settings differ between CI runs. In particular, if there is a cache hit, the Nix sandbox is disabled and thus internet connections become possible during all Nix builds.The combination of these two problems would result in unreproducible CI builds (which we probably would have never found).
Fortunately (in the sense that this problem became noticeable), there is another interaction that masks these unreproducible CI builds: By default, Nix doesn't have TLS certificates installed in the build environment (they shouldn't be needed anyways). When
github-metadata
tries to access the GitHub API, if there was a cache hit, it fails to validate the GitHub server because the TLS certificate is missing and thus aborts the build.Problem 1 is effectively solved by jekyll/github-metadata#285. We just need to upgrade the
github-metadata
version or wait untilnixpkgs
adopts that version and update nixpkgs. Then we can just set the environment variablePAGES_DISABLE_NETWORK
to reliable disable network connections even if the Nix sandbox is disabled.During investigation of problem 2, I noticed that the
cache-install
action, which implements the Nix installation caching for us, is kind of broken by design. I would really like to fix this, unfortunately this design is necessary due to limitations (regarding file permissions) of GitHub's cache action. After some experimentation I found a better approach and a more reliable caching mechanism:cache-install
with install-nix-action which just installs Nix in the GitHub Actions environment but is much faster even without using a similar cache. Though, this new installation doesn't cache our dependencies.cache-install
needs to integrate deeply into the Nix installation internals).During testing I found that change 1 decreases our build times from about 4:35 min to about 1:39 min. Hence, the cache of change 2 doesn't offer the same benefit as before. In particular, the build time when creating the cache increases to about 1:52 min and decreases with on a cache hit to about 1:19 min.
@pmbittner Do you think we should still implement the cache or just drop change 2 and stick with only using change 1? Note that with our current development pace, the cache is only rarely useful because it's automatically purged by GitHub after one week (that's the reason the CI failures were only sporadic).