fixes#232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
This is not unlikely to happen, given the enthusiasm shown by some users,
but we are not there yet, and this will save them from breaking their system.
Given that we are no longer inspecting the target of the /proc/self/exe
symlink, stop asserting that it has any properties. Remove the plumbing
for wrappersDir, which is no longer used.
Asserting that the binary is located in the specific place is no longer
necessary, because we don't rely on that location being writable only by
privileged entities (we used to rely on that when assuming that
readlink(/proc/self/exe) will continue to point at us and when assuming
that the `.real` file can be trusted).
Assertions about lack of write bits on the file were
IMO meaningless since inception: ignoring the Linux's refusal to honor
S[UG]ID bits on files-writeable-by-others, if someone could have
modified the wrapper in a way that preserved the capability or S?ID
bits, they could just remove this check.
Assertions about effective UID were IMO just harmful: if we were
executed without elevation, the caller would expect the result that
would cause in a wrapperless distro: the targets gets executed without
elevation. Due to lack of elevation, that cannot be used to abuse
privileges that the elevation would give.
This change partially fixes#98863 for S[UG]ID wrappers. The issue for
capability wrappers remains.
/proc/self/exe is a "fake" symlink. When it's opened, it always opens
the actual file that was execve()d in this process, even if the file was
deleted or renamed; if the file is no longer accessible from the current
chroot/mount namespace it will at the very worst fail and never open the
wrong file. Thus, we can make a much simpler argument that we're reading
capabilities off the correct file after this change (and that argument
doesn't rely on things such as protected_hardlinks being enabled, or no
users being able to write to /run/wrappers, or the verification that the
path readlink returns starts with /run/wrappers/).
Before this change it was crucial that nonprivileged users are unable to
create hardlinks to SUID wrappers, lest they be able to provide a
different `.real` file alongside. That was ensured by not providing a
location writable to them in the /run/wrappers tmpfs, (unless
disabled) by the fs.protected_hardlinks=1 sysctl, and by the explicit
own-path check in the wrapper. After this change, ensuring
that property is no longer important, and the check is most likely
redundant.
The simplification of expectations of the wrapper will make it
easier to remove some of the assertions in the wrapper (which currently
cause the wrapper to fail in no_new_privs environments, instead of
executing the target with non-elevated privileges).
Note that wrappers had to be copied (not symlinked) into /run/wrappers
due to the SUID/capability bits, and they couldn't be hard/softlinks of
each other due to those bits potentially differing. Thus, this change
doesn't increase the amount of memory used by /run/wrappers.
This change removes part of the test that is obsoleted by the removal of
`.real` files.
This change includes some stuff (e.g. reading of the `.real` file,
execution of the wrapper's target) that belongs to the apparmor policy
of the wrapper. This necessitates making them distinct for each wrapper.
The main reason for this change is as a preparation for making each
wrapper be a distinct binary.
Given that we are no longer inspecting the target of the /proc/self/exe
symlink, stop asserting that it has any properties. Remove the plumbing
for wrappersDir, which is no longer used.
Asserting that the binary is located in the specific place is no longer
necessary, because we don't rely on that location being writable only by
privileged entities (we used to rely on that when assuming that
readlink(/proc/self/exe) will continue to point at us and when assuming
that the `.real` file can be trusted).
Assertions about lack of write bits on the file were
IMO meaningless since inception: ignoring the Linux's refusal to honor
S[UG]ID bits on files-writeable-by-others, if someone could have
modified the wrapper in a way that preserved the capability or S?ID
bits, they could just remove this check.
Assertions about effective UID were IMO just harmful: if we were
executed without elevation, the caller would expect the result that
would cause in a wrapperless distro: the targets gets executed without
elevation. Due to lack of elevation, that cannot be used to abuse
privileges that the elevation would give.
This change partially fixes#98863 for S[UG]ID wrappers. The issue for
capability wrappers remains.
/proc/self/exe is a "fake" symlink. When it's opened, it always opens
the actual file that was execve()d in this process, even if the file was
deleted or renamed; if the file is no longer accessible from the current
chroot/mount namespace it will at the very worst fail and never open the
wrong file. Thus, we can make a much simpler argument that we're reading
capabilities off the correct file after this change (and that argument
doesn't rely on things such as protected_hardlinks being enabled, or no
users being able to write to /run/wrappers, or the verification that the
path readlink returns starts with /run/wrappers/).
Before this change it was crucial that nonprivileged users are unable to
create hardlinks to SUID wrappers, lest they be able to provide a
different `.real` file alongside. That was ensured by not providing a
location writable to them in the /run/wrappers tmpfs, (unless
disabled) by the fs.protected_hardlinks=1 sysctl, and by the explicit
own-path check in the wrapper. After this change, ensuring
that property is no longer important, and the check is most likely
redundant.
The simplification of expectations of the wrapper will make it
easier to remove some of the assertions in the wrapper (which currently
cause the wrapper to fail in no_new_privs environments, instead of
executing the target with non-elevated privileges).
Note that wrappers had to be copied (not symlinked) into /run/wrappers
due to the SUID/capability bits, and they couldn't be hard/softlinks of
each other due to those bits potentially differing. Thus, this change
doesn't increase the amount of memory used by /run/wrappers.
In user namespaces where an unprivileged user is mapped as root and root
is unmapped, setuid bits have no effect. However setuid root
executables like mount are still usable *in the namespace* as the user
already has the required privileges. This commit detects the situation
where the wrapper gained no privileges that the parent process did not
already have and in this case does less sanity checking. In short there
is no need to be picky since the parent already can execute the foo.real
executable themselves.
Details:
man 7 user_namespaces:
Set-user-ID and set-group-ID programs
When a process inside a user namespace executes a set-user-ID
(set-group-ID) program, the process's effective user (group) ID
inside the namespace is changed to whatever value is mapped for
the user (group) ID of the file. However, if either the user or
the group ID of the file has no mapping inside the namespace, the
set-user-ID (set-group-ID) bit is silently ignored: the new
program is executed, but the process's effective user (group) ID
is left unchanged. (This mirrors the semantics of executing a
set-user-ID or set-group-ID program that resides on a filesystem
that was mounted with the MS_NOSUID flag, as described in
mount(2).)
The effect of the setuid bit is that the real user id is preserved and
the effective and set user ids are changed to the owner of the wrapper.
We detect that no privilege was gained by checking that euid == suid
== ruid. In this case we stop checking that euid == owner of the
wrapper file.
As a reminder here are the values of euid, ruid, suid, stat.st_uid and
stat.st_mode & S_ISUID in various cases when running a setuid 42 executable as user 1000:
Normal case:
ruid=1000 euid=42 suid=42
setuid=2048, st_uid=42
nosuid mount:
ruid=1000 euid=1000 suid=1000
setuid=2048, st_uid=42
inside unshare -rm:
ruid=0 euid=0 suid=0
setuid=2048, st_uid=65534
inside unshare -rm, on a suid mount:
ruid=0 euid=0 suid=0
setuid=2048, st_uid=65534
The abstraction/nameservice profile from apparmor-profiles package
includes abstractions/nss-systemd. Without "reexporting" it,
the include fails and we get some errors.
There was a bug in the pam_mount module that crypt mount options were
not passed to the mount.crypt command. This is now fixed and
additionally, a cryptMountOptions NixOS option is added to define mount
options that should apply to all crypt mounts.
Fixes#230920
According to Ted Unangst, since doas evaluates rules in a last
matched manner, it is prudent to have the "permit root to do everything
without a password at the end of the file.
Source: https://flak.tedunangst.com/post/doas-mastery
This reverts commit 2265160fc0 and
e56db577a1.
Ideally, we shouldn't cause friction for users that bump `stateVersion`,
and I'd consider having to switch and/or manually hardcode a UID/GID
to supress the warning friction. I think it'd be more beneficial to, in
this rare case of an ID being missed, just let it be until more
discussion happens surrounding this overall issue.
See https://github.com/NixOS/nixpkgs/pull/217785 for more context.
this converts meta.doc into an md pointer, not an xml pointer. since we
no longer need xml for manual chapters we can also remove support for
manual chapters from md-to-db.sh
since pandoc converts smart quotes to docbook quote elements and our
nixos-render-docs does not we lose this distinction in the rendered
output. that's probably not that bad, our stylesheet didn't make use of
this anyway (and pre-23.05 versions of the chapters didn't use quote
elements either).
also updates the nixpkgs manual to clarify that option docs support all
extensions (although it doesn't support headings at all, so heading
anchors don't work by extension).
MD can only do the latter, so change them all over now to keeps diffs reviewable.
this also includes <literal><xref> -> <xref> where options are referenced since
the reference will implicitly add an inner literal tag.
markdown cannot represent those links. remove them all now instead of in
each chapter conversion to keep the diff for each chapter small and more
understandable.
fscrypt can automatically unlock directories with the user's login
password. To do this it ships a PAM module which reads the user's
password and loads the respective keys into the user's kernel keyring.
Significant inspiration was taken from the ecryptfs implementation.
With Go 1.19 calls to setrlimit are required for lego to run.
While we could allow setrlimit alone, I think it is not unreasonable to
allow @resources in general.
Closes: #197513
Lego has a built-in mechanism for sleeping for a random amount
of time before renewing a certificate. In our environment this
is not only unnecessary (as our systemd timer takes care of it)
but also unwanted since it slows down the execution of the
systemd service encompassing it, thus also slowing down the
start up of any services its depending on.
Also added FixedRandomDelay to the timer for more predictability.
Fixes#190493
Check if an actual key file exists. This does not
completely cover the work accountHash does to ensure
that a new account is registered when account
related options are changed.
Fixes#191794
Lego threw a permission denied error binding to port 80.
AmbientCapabilities with CAP_NET_BIND_SERVICE was required.
Also added a test for this.
Before this change, the description for
security.wrappers.<name>.capabilities made it seem like you could just
string together the names of capabilities like this:
capabilities = "CAP_SETUID,CAP_SETGID";
In reality, each item in the list must be a full-on capability clause:
capabilities = "CAP_SETUID=ep,CAP_SETGID+i";
Summary: fix errors with example code in the manual that shows how to set up DNS-01 verification via the acme protocol, e.g. for those who want to get wildcard certificates from Let's Encrypt.
Fix syntax error in nix arrays (there should not be commas.)
Fix permissions on /var/lib/secrets so it can be read by bind daemon. Without this fix bind won't start.
Add the missing feature: put the generated secret into certs.secret
most of these are hidden because they're either part of a submodule that
doesn't have its type rendered (eg because the submodule type is used in
an either type) or because they are explicitly hidden. some of them are
merely hidden from nix-doc-munge by how their option is put together.
conversions were done using https://github.com/pennae/nix-doc-munge
using (probably) rev f34e145 running
nix-doc-munge nixos/**/*.nix
nix-doc-munge --import nixos/**/*.nix
the tool ensures that only changes that could affect the generated
manual *but don't* are committed, other changes require manual review
and are discarded.
there are sufficiently few variable list around, and they are
sufficiently simple, that it doesn't seem helpful to add another
markdown extension for them. rendering differences are small, except in
the tor module: admonitions inside other blocks cannot be made to work
well with mistune (and likely most other markdown processors), so those
had to be shuffled a bit. we also lose paragraph breaks in the list
items due to how we have to render from markdown to docbook, but once we
remove docbook from the pipeline those paragraph breaks will be restored.
mostly no rendering changes. some lists (like simplelist) don't have an
exact translation to markdown, so we use a comma-separated list of
literals instead.
using regular strings works well for docbook because docbook is not as
whitespace-sensitive as markdown. markdown would render all of these as
code blocks when given the chance.
this renders the same in the manpage and a little more clearly in the
html manual. in the manpage there continues to be no distinction from
regular text, the html manual gets code-type markup (which was probably
the intention for most of these uses anyway).
now nix-doc-munge will not introduce whitespace changes when it replaces
manpage references with the MD equivalent.
no change to the manpage, changes to the HTML manual are whitespace only.
make (almost) all links appear on only a single line, with no
unnecessary whitespace, using double quotes for attributes. this lets us
automatically convert them to markdown easily.
the few remaining links are extremely long link in a gnome module, we'll
come back to those at a later date.
we can't embed syntactic annotations of this kind in markdown code
blocks without yet another extension. replaceable is rare enough to make
this not much worth it, so we'll go with «thing» instead. the module
system already uses this format for its placeholder names in attrsOf
paths.
markdown can't represent the difference without another extension and
both the html manual and the manpage render them the same, so keeping the
distinction is not very useful on its own. with the distinction removed
we can automatically convert many options that use <code> tags to markdown.
the manpage remains unchanged, html manual does not render
differently (but class names on code tags do change from "code" to "literal").
Instead of enabling the PAM modules based on config.krb5.enable,
introduce a new option to control the PAM modules specifically.
Users may want to turn on config.krb5.enable, to get a working Kerberos
client config with tools like kinit, while letting pam_sss or something
else handle Kerberos password lookups.
the conversion procedure is simple:
- find all things that look like options, ie calls to either `mkOption`
or `lib.mkOption` that take an attrset. remember the attrset as the
option
- for all options, find a `description` attribute who's value is not a
call to `mdDoc` or `lib.mdDoc`
- textually convert the entire value of the attribute to MD with a few
simple regexes (the set from mdize-module.sh)
- if the change produced a change in the manual output, discard
- if the change kept the manual unchanged, add some text to the
description to make sure we've actually found an option. if the
manual changes this time, keep the converted description
this procedure converts 80% of nixos options to markdown. around 2000
options remain to be inspected, but most of those fail the "does not
change the manual output check": currently the MD conversion process
does not faithfully convert docbook tags like <code> and <package>, so
any option using such tags will not be converted at all.
Fix bug where pam_u2f options would be partially included in other pam.d
files if the module was enable for specific services, resulting in
broken configuration.
Previously, `pam_unix.so` was `required` to set PAM_AUTHTOK so that
dependent pam modules (such as gnome keyering) could use the password
(for example to unlock a keyring) upon login of the user. This however
broke any additional auth providers (such as AD or LDAP): for any
non-local user `pam_unix.so` will not yield success, thus eventually the
auth would fail (even the following auth providers were actually
executed, they could not overrule the already failed auth).
This change replaces `required` by `optional`. Therefore, the
`pam_unix.so` is executed and can set the PAM_AUTHTOK for the following
optional modules, _even_ if the user is not a local user. Therefore, the
gnome keyring for example is unlocked both for local and additional
users upon login, and login is working for non-local users via
LDAP/AD.
activating the configuration...
setting up /etc...
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.messagebus’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
chown: warning: '.' should be ':': ‘root.root’
reloading user units for root...
pam-ussh allows authorizing using an SSH certificate stored in your
SSH agent, in a similar manner to pam-ssh-agent-auth, but for
certificates rather than raw public keys.
In issue #157787 @martined wrote:
Trying to use confinement on packages providing their systemd units
with systemd.packages, for example mpd, fails with the following
error:
system-units> ln: failed to create symbolic link
'/nix/store/...-system-units/mpd.service': File exists
This is because systemd-confinement and mpd both provide a mpd.service
file through systemd.packages. (mpd got updated that way recently to
use upstream's service file)
To address this, we now place the unit file containing the bind-mounted
paths of the Nix closure into a drop-in directory instead of using the
name of a unit file directly.
This does come with the implication that the options set in the drop-in
directory won't apply if the main unit file is missing. In practice
however this should not happen for two reasons:
* The systemd-confinement module already sets additional options via
systemd.services and thus we should get a main unit file
* In the unlikely event that we don't get a main unit file regardless
of the previous point, the unit would be a no-op even if the options
of the drop-in directory would apply
Another thing to consider is the order in which those options are
merged, since systemd loads the files from the drop-in directory in
alphabetical order. So given that we have confinement.conf and
overrides.conf, the confinement options are loaded before the NixOS
overrides.
Since we're only setting the BindReadOnlyPaths option, the order isn't
that important since all those paths are merged anyway and we still
don't lose the ability to reset the option since overrides.conf comes
afterwards.
Fixes: https://github.com/NixOS/nixpkgs/issues/157787
Signed-off-by: aszlig <aszlig@nix.build>