Vulnerabilities caused by argv[0] mishandling in privileged code keep coming
up, recently CVE-2021-4034 in polkit and CVE-2023-6246 in glibc. On the other
hand, legitimate handling of argv[0] is mostly limited to logging and
multiplexing different functionality depending on the basename of the link (an
example for the latter is sudo/sudoedit).
On NixOS, by far the most common source of untrusted argv[0] to privileged
processes should be the wrapper, and it is not used for multiplexing (separate
wrappers are used instead). So we always pass the path of the wrapped program
as argv[0]. Obsolete mitigations for older argv[0]-based issues are deleted.
Would otherwise fail with
```
error: A definition for option `systemd.services.auditd.conflicts."[definition 1-entry 1]"' is not of type `string matching the pattern [a-zA-Z0-9@%:_.\-]+[.](service|socket|device|mount|automount|swap|target|path|timer|scope|slice)'. Definition values:
- In `/nix/store/x2khl2yx0vz2i357x7mz5xm1kagql8ag-source/nixos/modules/security/auditd.nix': "shutdown.target "
```
Makes it possible to override properties of a rule by name. Introduces
an 'order' field that can be overridden to change the sequence of rules.
For now, the order value for each built-in rule is derived from its
place in the hardcoded list of rules.
Adds easily overrideable settings for the most common PAM argument
styles. These are:
- Flag (e.g. "use_first_pass"): rendered for true boolean values. false
values are ignored.
- Key-value (e.g. "action=validate"): rendered for non-null, non-boolean
values.
Most PAM arguments can be configured this way. Others can still be
configured with the 'args' option.
PIE causes problems with static binaries on ARM (see 76552e9). It is
enabled by default on other platforms anyway when musl is used, so we
don't need to specify it manually.
These names are internal identifiers. They will be used as keys so that
users can reconfigure rules by merging a rule config with the same name.
The name is arbitrary. The built-in rules are named after the PAM where
practical.
Eliminates a redundancy between the 'rules' suboptions and the type
specified in each rule.
We eventually want to give each rule a name so that we can merge config
overrides. The PAM name is a natural choice for rule name, but a PAM is
often used in multiple rule types. Organizing rules by type and rule
name avoids name collisions.
This mitigates CVE-2023-4911, crucially without a mass-rebuild.
We drop insecure environment variables explicitly, including
glibc-specific ones, since musl doesn't do this by default.
Change-Id: I591a817e6d4575243937d9ccab51c23a96bed6f9
This is just a quick fix based on pname,
as I have no idea how to use slicing in the module
We should instead use slicing to get the package for the host
From systemd 243 release note[1]:
This release enables unprivileged programs (i.e. requiring neither
setuid nor file capabilities) to send ICMP Echo (i.e. ping) requests
by turning on the "net.ipv4.ping_group_range" sysctl of the Linux
kernel for the whole UNIX group range, i.e. all processes.
So this wrapper is not needed any more.
See also [2] and [3].
This patch also removes:
- apparmor profiles in NixOS for ping itself and the wrapped one
- other references for the wrapped ping
[1]: 8e2d9d40b3/NEWS (L6457-L6464)
[2]: https://github.com/systemd/systemd/pull/13141
[3]: https://fedoraproject.org/wiki/Changes/EnableSysctlPingGroupRange
This is preferable even for regular `sudo`, but will ensure the check is useful
when using `sudo-rs` in the future.
Also, dropped antediluvian comment about the syntax check being disabled,
when it was clearly not commented out:
- introduced in 2007, commit 6d65f0ae03ae14f3e978d89959253d9a8f5e0ec1;
- reverted in 2014, commit e68a5b265a,
but without ammending the comments.
fixes#232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
This is not unlikely to happen, given the enthusiasm shown by some users,
but we are not there yet, and this will save them from breaking their system.
Given that we are no longer inspecting the target of the /proc/self/exe
symlink, stop asserting that it has any properties. Remove the plumbing
for wrappersDir, which is no longer used.
Asserting that the binary is located in the specific place is no longer
necessary, because we don't rely on that location being writable only by
privileged entities (we used to rely on that when assuming that
readlink(/proc/self/exe) will continue to point at us and when assuming
that the `.real` file can be trusted).
Assertions about lack of write bits on the file were
IMO meaningless since inception: ignoring the Linux's refusal to honor
S[UG]ID bits on files-writeable-by-others, if someone could have
modified the wrapper in a way that preserved the capability or S?ID
bits, they could just remove this check.
Assertions about effective UID were IMO just harmful: if we were
executed without elevation, the caller would expect the result that
would cause in a wrapperless distro: the targets gets executed without
elevation. Due to lack of elevation, that cannot be used to abuse
privileges that the elevation would give.
This change partially fixes#98863 for S[UG]ID wrappers. The issue for
capability wrappers remains.
/proc/self/exe is a "fake" symlink. When it's opened, it always opens
the actual file that was execve()d in this process, even if the file was
deleted or renamed; if the file is no longer accessible from the current
chroot/mount namespace it will at the very worst fail and never open the
wrong file. Thus, we can make a much simpler argument that we're reading
capabilities off the correct file after this change (and that argument
doesn't rely on things such as protected_hardlinks being enabled, or no
users being able to write to /run/wrappers, or the verification that the
path readlink returns starts with /run/wrappers/).
Before this change it was crucial that nonprivileged users are unable to
create hardlinks to SUID wrappers, lest they be able to provide a
different `.real` file alongside. That was ensured by not providing a
location writable to them in the /run/wrappers tmpfs, (unless
disabled) by the fs.protected_hardlinks=1 sysctl, and by the explicit
own-path check in the wrapper. After this change, ensuring
that property is no longer important, and the check is most likely
redundant.
The simplification of expectations of the wrapper will make it
easier to remove some of the assertions in the wrapper (which currently
cause the wrapper to fail in no_new_privs environments, instead of
executing the target with non-elevated privileges).
Note that wrappers had to be copied (not symlinked) into /run/wrappers
due to the SUID/capability bits, and they couldn't be hard/softlinks of
each other due to those bits potentially differing. Thus, this change
doesn't increase the amount of memory used by /run/wrappers.
This change removes part of the test that is obsoleted by the removal of
`.real` files.
This change includes some stuff (e.g. reading of the `.real` file,
execution of the wrapper's target) that belongs to the apparmor policy
of the wrapper. This necessitates making them distinct for each wrapper.
The main reason for this change is as a preparation for making each
wrapper be a distinct binary.
Given that we are no longer inspecting the target of the /proc/self/exe
symlink, stop asserting that it has any properties. Remove the plumbing
for wrappersDir, which is no longer used.
Asserting that the binary is located in the specific place is no longer
necessary, because we don't rely on that location being writable only by
privileged entities (we used to rely on that when assuming that
readlink(/proc/self/exe) will continue to point at us and when assuming
that the `.real` file can be trusted).
Assertions about lack of write bits on the file were
IMO meaningless since inception: ignoring the Linux's refusal to honor
S[UG]ID bits on files-writeable-by-others, if someone could have
modified the wrapper in a way that preserved the capability or S?ID
bits, they could just remove this check.
Assertions about effective UID were IMO just harmful: if we were
executed without elevation, the caller would expect the result that
would cause in a wrapperless distro: the targets gets executed without
elevation. Due to lack of elevation, that cannot be used to abuse
privileges that the elevation would give.
This change partially fixes#98863 for S[UG]ID wrappers. The issue for
capability wrappers remains.
/proc/self/exe is a "fake" symlink. When it's opened, it always opens
the actual file that was execve()d in this process, even if the file was
deleted or renamed; if the file is no longer accessible from the current
chroot/mount namespace it will at the very worst fail and never open the
wrong file. Thus, we can make a much simpler argument that we're reading
capabilities off the correct file after this change (and that argument
doesn't rely on things such as protected_hardlinks being enabled, or no
users being able to write to /run/wrappers, or the verification that the
path readlink returns starts with /run/wrappers/).
Before this change it was crucial that nonprivileged users are unable to
create hardlinks to SUID wrappers, lest they be able to provide a
different `.real` file alongside. That was ensured by not providing a
location writable to them in the /run/wrappers tmpfs, (unless
disabled) by the fs.protected_hardlinks=1 sysctl, and by the explicit
own-path check in the wrapper. After this change, ensuring
that property is no longer important, and the check is most likely
redundant.
The simplification of expectations of the wrapper will make it
easier to remove some of the assertions in the wrapper (which currently
cause the wrapper to fail in no_new_privs environments, instead of
executing the target with non-elevated privileges).
Note that wrappers had to be copied (not symlinked) into /run/wrappers
due to the SUID/capability bits, and they couldn't be hard/softlinks of
each other due to those bits potentially differing. Thus, this change
doesn't increase the amount of memory used by /run/wrappers.
In user namespaces where an unprivileged user is mapped as root and root
is unmapped, setuid bits have no effect. However setuid root
executables like mount are still usable *in the namespace* as the user
already has the required privileges. This commit detects the situation
where the wrapper gained no privileges that the parent process did not
already have and in this case does less sanity checking. In short there
is no need to be picky since the parent already can execute the foo.real
executable themselves.
Details:
man 7 user_namespaces:
Set-user-ID and set-group-ID programs
When a process inside a user namespace executes a set-user-ID
(set-group-ID) program, the process's effective user (group) ID
inside the namespace is changed to whatever value is mapped for
the user (group) ID of the file. However, if either the user or
the group ID of the file has no mapping inside the namespace, the
set-user-ID (set-group-ID) bit is silently ignored: the new
program is executed, but the process's effective user (group) ID
is left unchanged. (This mirrors the semantics of executing a
set-user-ID or set-group-ID program that resides on a filesystem
that was mounted with the MS_NOSUID flag, as described in
mount(2).)
The effect of the setuid bit is that the real user id is preserved and
the effective and set user ids are changed to the owner of the wrapper.
We detect that no privilege was gained by checking that euid == suid
== ruid. In this case we stop checking that euid == owner of the
wrapper file.
As a reminder here are the values of euid, ruid, suid, stat.st_uid and
stat.st_mode & S_ISUID in various cases when running a setuid 42 executable as user 1000:
Normal case:
ruid=1000 euid=42 suid=42
setuid=2048, st_uid=42
nosuid mount:
ruid=1000 euid=1000 suid=1000
setuid=2048, st_uid=42
inside unshare -rm:
ruid=0 euid=0 suid=0
setuid=2048, st_uid=65534
inside unshare -rm, on a suid mount:
ruid=0 euid=0 suid=0
setuid=2048, st_uid=65534
The abstraction/nameservice profile from apparmor-profiles package
includes abstractions/nss-systemd. Without "reexporting" it,
the include fails and we get some errors.
There was a bug in the pam_mount module that crypt mount options were
not passed to the mount.crypt command. This is now fixed and
additionally, a cryptMountOptions NixOS option is added to define mount
options that should apply to all crypt mounts.
Fixes#230920