This prevents issues I'm seeing with the hydra I'm running on my laptop.
Every time I reboot it I see eval errors like this:
```
error fetching latest change from git repo at `https://github.com/nixos/nixpkgs.git':
fatal: unable to access 'https://github.com/nixos/nixpkgs.git/': Could not resolve host: github.com
```
This is because the evaluator already starts before the network is
actually online. It should wait until the network is fully online before
starting evaluation to prevent evaluation errors like above.
Nothing the script `config.sh` does prior to the final call to
`Runner.Listener configure` is relevant for the systemd service.
Particularly, we don't need (nor want) any of the artifacts the `env.sh`
script creates.
We’ve been having trouble figuring out which kind of token to use and
why our setup would break every few system updates.
This should clarify which options there are, and which ones lead to
better results.
Ideally there would be a manual section that has a step-by-step guide
on how to set up the github runner, with screenshots and everything.
Purge contents of `workDir` as root to also allow the removal of files
marked as read-only. It is easy to create read-only files in `workDir`,
e.g., by copying files from the Nix store.
The `serviceOverrides` module option is commonly used to loosen the
systemd unit's hardening. This commit merges the `serviceConfig` with
`mkMerge` instead of using the update operator `//` which discards all
existing values on conflict. To avoid a breaking change which requires
defining each option with a higher priority (e.g., through `mkForce`),
this commit prefixes hardening values with `mkDefault`.
Notable exceptions are list hardening options which use `mkBefore`
instead of `mkDefault`. This allows for easy extension of the existing
settings. Resetting redefinitions are still possible through `mkForce`.
when you register a runner with spaces in its name (possible if you use 'description' option) then the runners never get unregistered because our bash scripts assume no space in names.
This solves the issue
Retreiving the fullname of the runner via `gitlab-runner list` got surprisingly hard between lazy-capture issues and `gitlab-runner list` displaying invisible (CSI) characters that break the regex etc.
Which is why I fell back on the pseudo-json format.
This PR adds the hash in the name, which allows to keep both the
stateless aspect of the module while allowing for a freeform name.
I found using bash associative arrays easier to use/debug than the current
approach.
On some occasions, the GitHub runner service encounters errors which are
deemed retryable but result in the runner's termination. To signal a
retryable error, the runner exits with status code 2:
https://github.com/actions/runner/blob/40ed7f8/src/Runner.Common/Constants.cs#L146
To account for that behavior, this commit sets
`RestartForceExitStatus=2` which results in a service restart regardless
of using an ephemeral runner or not.
The new defaults allows jenkins-job-builder to reload the configuration
out-of-the-box, whereas the previous defaults required users to manually
reload/restart jenkins, or configure accessUser/accessTokenFile
themselves.
(If `extraJavaOptions = [ "-Djenkins.install.runSetupWizard=false" ]`
then the initial admin user is *not* created and you have to use JCasC
or something else to bootstrap.)
This commit fixes two bugs:
1) When starting a github-runner for the very first time, the
unconfigure script did not copy the `tokenFile` to the state
directory. This case just was not handled so far. As a result, the
runner could not configure. The unit did, however, fail even before
as the state token file is configured as inaccessible for the service
through `InaccessiblePaths=`. As the given path did not exist in the
described case, setting up the unit's namespacing failed.
2) Similarly, the `tokenFile` is also marked as not accessible to the
service user. There are, however, cases where other namespacing
options make the files inaccessible even before `InaccessiblePaths=`
kicks in; thus, they appear as non existing and cause the namespacing
to fail yet again. Prefixing the entry with a `-` causes Systemd to
ignore the entry if it cannot find it. This is the behavior we want.
I also took fixing those bugs as a chance to refactor the unconfigure
script to make it easier to follow.
conversions were done using https://github.com/pennae/nix-doc-munge
using (probably) rev f34e145 running
nix-doc-munge nixos/**/*.nix
nix-doc-munge --import nixos/**/*.nix
the tool ensures that only changes that could affect the generated
manual *but don't* are committed, other changes require manual review
and are discarded.
using regular strings works well for docbook because docbook is not as
whitespace-sensitive as markdown. markdown would render all of these as
code blocks when given the chance.
The `runsvc.sh` script wraps a JavaScript script which starts
`Runner.Listener` and also handles failures. This has the downside that
the service _always_ exits with status code 0, i.e., success. This
causes frequent service restarts when running in ephemeral mode with a
faulty config as Systemd always sees a success exit status. To prevent
this, this commit changes the service config to call `Runner.Listener`
directly. The JavaScript wrapper stops the process with a SIGINT, hence,
the Systemd unit now sends a SIGINT to stop the service.
Adds the module option `ephemeral`. If set to true, configures the
runner registration with the `--ephemeral` option. This causes the
runner to exit after processing a single job, to de-register itself, and
to delete its configuration. Afterward, systemd restarts the service
which triggers a new ephemeral registration with a clean state.
This commit introduces support for runner registrations through a
personal access token (PAT). To use a PAT instead of a registration
token, place an appropriately scoped PAT in `tokenFile`. If the file
contains a PAT, the configuration script queries a new runner
registration token. Using a runner registration token directly continues
to work as before.
Using the runtime directory as `RUNNER_ROOT` is wrong. We should always
use the state directory like we already do when invoking the runner
configure script. Otherwise, the runner constructs the wrong path for
some files (.credentials, .runner, ...).
make (almost) all links appear on only a single line, with no
unnecessary whitespace, using double quotes for attributes. this lets us
automatically convert them to markdown easily.
the few remaining links are extremely long link in a gnome module, we'll
come back to those at a later date.
the conversion procedure is simple:
- find all things that look like options, ie calls to either `mkOption`
or `lib.mkOption` that take an attrset. remember the attrset as the
option
- for all options, find a `description` attribute who's value is not a
call to `mdDoc` or `lib.mdDoc`
- textually convert the entire value of the attribute to MD with a few
simple regexes (the set from mdize-module.sh)
- if the change produced a change in the manual output, discard
- if the change kept the manual unchanged, add some text to the
description to make sure we've actually found an option. if the
manual changes this time, keep the converted description
this procedure converts 80% of nixos options to markdown. around 2000
options remain to be inspected, but most of those fail the "does not
change the manual output check": currently the MD conversion process
does not faithfully convert docbook tags like <code> and <package>, so
any option using such tags will not be converted at all.
This change allows detecting configuration errors during
switch-to-configuration instead of them being reported asynchronously
*after* switch-to-configuration has exited.
(And update the NixOS test accordingly.)
The current authentication code is broken against newer jenkins:
jenkins-job-builder-start[1257]: Asking Jenkins to reload config
jenkins-start[789]: 2022-07-12 14:34:31.148+0000 [id=17] WARNING hudson.security.csrf.CrumbFilter#doFilter: Found invalid crumb 31e96e52938b51f099a61df9505a4427cb9dca7e35192216755659032a4151df. If you are calling this URL with a script, please use the API Token instead. More information: https://www.jenkins.io/redirect/crumb-cannot-be-used-for-script
jenkins-start[789]: 2022-07-12 14:34:31.160+0000 [id=17] WARNING hudson.security.csrf.CrumbFilter#doFilter: No valid crumb was included in request for /reload by admin. Returning 403.
jenkins-job-builder-start[1357]: curl: (22) The requested URL returned error: 403
Fix it by using `jenkins-cli` instead of messing with `curl`.
This rewrite also prevents leaking the password in process listings. (We
could probably do it without `replace-secret`, assuming `printf` is a
shell built-in, but this implementation should be safe even with shells
not having a built-in `printf`.)
Ref https://github.com/NixOS/nixpkgs/issues/156400.
The description for the runner in the UI is by default sthg like
"npm_nixos_d0544ed48909" i.e., the name of the attribute.
I wanted to have a more user-friendly description and added a
description to the service.
Seems like gitlab-runner doesn't like having both fields set:
"Cannot use two forms of the same flag: description name"
so I used one or the other.
Installs Java into the Jenkins agent and allows specifying the JDK/JRE package to use. This is necessary as Jenkins verifies if the agent contains Java installed through the java -fullversion command, which if not, the connection will fail.
We spent a whole afternoon debugging this, because upstream has very
bad software quality and the error messages were incredibly
misleading.
So let’s document it for the sanity of other people.
Btw, I think the implementation of our module is pretty brittle,
especially the part about diffing tokens to check whether they
changed. We should rather just request a new builder registration
every time, it’s not that much overhead, and always set `replace` so
it is idempotent.