Currently if you want to properly chroot a systemd service, you could do
it using BindReadOnlyPaths=/nix/store or use a separate derivation which
gathers the runtime closure of the service you want to chroot. The
former is the easier method and there is also a method directly offered
by systemd, called ProtectSystem, which still leaves the whole store
accessible. The latter however is a bit more involved, because you need
to bind-mount each store path of the runtime closure of the service you
want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages.
However, this process is a bit tedious, so the changes here implement
this in a more generic way.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.myservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
confinement.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes script and {pre,post}Start) need to be in the chroot,
it can be specified using the confinement.packages option. By default
(which uses the full-apivfs confinement mode), a user namespace is set
up as well and /proc, /sys and /dev are mounted appropriately.
In addition - and by default - a /bin/sh executable is provided, which
is useful for most programs that use the system() C library call to
execute commands via shell.
Unfortunately, there are a few limitations at the moment. The first
being that DynamicUser doesn't work in conjunction with tmpfs, because
systemd seems to ignore the TemporaryFileSystem option if DynamicUser is
enabled. I started implementing a workaround to do this, but I decided
to not include it as part of this pull request, because it needs a lot
more testing to ensure it's consistent with the behaviour without
DynamicUser.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and doesn't
include/exclude the individual bind mounts or the tmpfs.
A quirk we do have right now is that systemd tries to create a /usr
directory within the chroot, which subsequently fails. Fortunately, this
is just an ugly error and not a hard failure.
The changes also come with a changelog entry for NixOS 19.03, which is
why I asked for a vote of the NixOS 19.03 stable maintainers whether to
include it (I admit it's a bit late a few days before official release,
sorry for that):
@samueldr:
Via pull request comment[1]:
+1 for backporting as this only enhances the feature set of nixos,
and does not (at a glance) change existing behaviours.
Via IRC:
new feature: -1, tests +1, we're at zero, self-contained, with no
global effects without actively using it, +1, I think it's good
@lheckemann:
Via pull request comment[2]:
I'm neutral on backporting. On the one hand, as @samueldr says,
this doesn't change any existing functionality. On the other hand,
it's a new feature and we're well past the feature freeze, which
AFAIU is intended so that new, potentially buggy features aren't
introduced in the "stabilisation period". It is a cool feature
though? :)
A few other people on IRC didn't have opposition either against late
inclusion into NixOS 19.03:
@edolstra: "I'm not against it"
@Infinisil: "+1 from me as well"
@grahamc: "IMO its up to the RMs"
So that makes +1 from @samueldr, 0 from @lheckemann, 0 from @edolstra
and +1 from @Infinisil (even though he's not a release manager) and no
opposition from anyone, which is the reason why I'm merging this right
now.
I also would like to thank @Infinisil, @edolstra and @danbst for their
reviews.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-477322127
[2]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-477548395
Currently, if you want to properly chroot a systemd service, you could
do it using BindReadOnlyPaths=/nix/store (which is not what I'd call
"properly", because the whole store is still accessible) or use a
separate derivation that gathers the runtime closure of the service you
want to chroot. The former is the easier method and there is also a
method directly offered by systemd, called ProtectSystem, which still
leaves the whole store accessible. The latter however is a bit more
involved, because you need to bind-mount each store path of the runtime
closure of the service you want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages. That's also what I did several times[1][2] in the
past.
However, this process got a bit tedious, so I decided that it would be
generally useful for NixOS, so this very implementation was born.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.yourservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
chroot.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes "script" and {pre,post}Start) need to be in the
chroot, it can be specified using the chroot.packages option. By
default (which uses the "full-apivfs"[3] confinement mode), a user
namespace is set up as well and /proc, /sys and /dev are mounted
appropriately.
In addition - and by default - a /bin/sh executable is provided as well,
which is useful for most programs that use the system() C library call
to execute commands via shell. The shell providing /bin/sh is dash
instead of the default in NixOS (which is bash), because it's way more
lightweight and after all we're chrooting because we want to lower the
attack surface and it should be only used for "/bin/sh -c something".
Prior to submitting this here, I did a first implementation of this
outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality
from systemd-lib.nix, just because it's only a single line.
However, I decided to just re-use the one from systemd here and
subsequently made it available when importing systemd-lib.nix, so that
the systemd-chroot implementation also benefits from fixes to that
functionality (which is now a proper function).
Unfortunately, we do have a few limitations as well. The first being
that DynamicUser doesn't work in conjunction with tmpfs, because it
already sets up a tmpfs in a different path and simply ignores the one
we define. We could probably solve this by detecting it and try to
bind-mount our paths to that different path whenever DynamicUser is
enabled.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and not the
individual bind mounts or our tmpfs. It would be helpful if systemd
would have a way to disable specific bind mounts as well or at least
have some way to ignore failures for the bind mounts/tmpfs setup.
Another quirk we do have right now is that systemd tries to create a
/usr directory within the chroot, which subsequently fails. Fortunately,
this is just an ugly error and not a hard failure.
[1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62
[2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124
[3]: The reason this is called "full-apivfs" instead of just "full" is
to make room for a *real* "full" confinement mode, which is more
restrictive even.
[4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix
Signed-off-by: aszlig <aszlig@nix.build>
This should make the composability of kernel configurations more straigthforward.
- now distinguish freeform options from tristate ones
- will look for a structured config in kernelPatches too
one can now access the structuredConfig from a kernel via linux_test.configfile.structuredConfig
in order to reinject it into another kernel, no need to rewrite the config from scratch
The following merge strategies are used in case of conflict:
-- freeform items must be equal or they conflict (mergeEqualOption)
-- for tristate (y/m/n) entries, I use the mergeAnswer strategy which takes the best available value, "best" being defined by the user (by default "y" > "m" > "n", e.g. if one entry is both marked "y" and "n", "y" wins)
-- if one item is both marked optional/mandatory, mandatory wins (mergeFalseByDefault)
Right now it's not at all obvious that one can override this option
using `services.logind.extraConfig`; we might as well add an option
for `killUserProcesses` directly so it's clear and documented.
The new reuse behaviour is cool and really useful but it breaks one of
my use cases. When using kexec, I have a script which will unlock the
disks in my initrd. However, do_open_passphrase will fail if the disk is
already unlocked.
The previous version contained a false positive case, where boot would
continue when the stage 2 init did not exist at all, and a false
negative case, where boot would stop if the stage 2 init was a symlink
which cannot be resolved in the initramfs root.
Fixes#49519.
Thanks @michas2 for finding and reporting the issue!
* journald: forward message to syslog by default if a syslog implementation is installed
* added a test to ensure rsyslog is receiving messages when expected
* added rsyslogd tests to release.nix
* Lets container@.service be activated by machines.target instead of
multi-user.target
According to the systemd manpages, all containers that are registered
by machinectl, should be inside machines.target for easy stopping
and starting container units altogether
* make sure container@.service and container.slice instances are
actually located in machine.slice
https://plus.google.com/112206451048767236518/posts/SYAueyXHeEX
See original commit: https://github.com/NixOS/systemd/commit/45d383a3b8
* Enable Cgroup delegation for nixos-containers
Delegate=yes should be set for container scopes where a systemd instance
inside the container shall manage the hierarchies below its own cgroup
and have access to all controllers.
This is equivalent to enabling all accounting options on the systemd
process inside the system container. This means that systemd inside
the container is responsible for managing Cgroup resources for
unit files that enable accounting options inside. Without this
option, units that make use of cgroup features within system
containers might misbehave
See original commit: https://github.com/NixOS/systemd/commit/a931ad47a8
from the manpage:
Turns on delegation of further resource control partitioning to
processes of the unit. Units where this is enabled may create and
manage their own private subhierarchy of control groups below the
control group of the unit itself. For unprivileged services (i.e.
those using the User= setting) the unit's control group will be made
accessible to the relevant user. When enabled the service manager
will refrain from manipulating control groups or moving processes
below the unit's control group, so that a clear concept of ownership
is established: the control group tree above the unit's control
group (i.e. towards the root control group) is owned and managed by
the service manager of the host, while the control group tree below
the unit's control group is owned and managed by the unit itself.
Takes either a boolean argument or a list of control group
controller names. If true, delegation is turned on, and all
supported controllers are enabled for the unit, making them
available to the unit's processes for management. If false,
delegation is turned off entirely (and no additional controllers are
enabled). If set to a list of controllers, delegation is turned on,
and the specified controllers are enabled for the unit. Note that
additional controllers than the ones specified might be made
available as well, depending on configuration of the containing
slice unit or other units contained in it. Note that assigning the
empty string will enable delegation, but reset the list of
controllers, all assignments prior to this will have no effect.
Defaults to false.
Note that controller delegation to less privileged code is only safe
on the unified control group hierarchy. Accordingly, access to the
specified controllers will not be granted to unprivileged services
on the legacy hierarchy, even when requested.
The following controller names may be specified: cpu, cpuacct, io,
blkio, memory, devices, pids. Not all of these controllers are
available on all kernels however, and some are specific to the
unified hierarchy while others are specific to the legacy hierarchy.
Also note that the kernel might support further controllers, which
aren't covered here yet as delegation is either not supported at all
for them or not defined cleanly.
"machine.target" doesn't actually exist, it's misspelled version
of "machines.target". However, the "systemd-nspawn@.service"
unit already has a default dependency on "machines.target"
The improved lspci command shows all available ethernet controllers and
their kernel modules. Previously, the user had to provide the slot name
of a specific device.
The default value for journald's Storage option is "auto", which
determines whether to log to /var/log/journal based on whether that
directory already exists. So NixOS has been unconditionally creating
that directory in activation scripts.
However, we can get the same behavior by configuring journald.conf to
set Storage to "persistent" instead. In that case, journald will create
the directory itself if necessary.
Previously, the activation script was responsible for ensuring that
/etc/machine-id exists. However, the only time it could not already
exist is during stage-2-init, not while switching configurations,
because one of the first things systemd does when starting up as PID 1
is to create this file. So I've moved the initialization to
stage-2-init.
Furthermore, since systemd will do the equivalent of
systemd-machine-id-setup if /etc/machine-id doesn't have valid contents,
we don't need to do that ourselves.
We _do_, however, want to ensure that the file at least exists, because
systemd also uses the non-existence of this file to guess that this is a
first-boot situation. In that case, systemd tries to create some
symlinks in /etc/systemd/system according to its presets, which it can't
do because we've already populated /etc according to the current NixOS
configuration.
This is not necessary for any other activation script snippets, so it's
okay to do it after stage-2-init runs the activation script. None of
them declare a dependency on the "systemd" snippet. Also, most of them
only create files or directories in ways that obviously don't need the
machine-id set.
Evaluation error introduced in 599c4df46a.
There is only a "platformS" attribute in kexectools.meta, so let's use
this and from the code in the kexec module it operates on a list,
matching the corresponding platforms, so this seems to be the attribute
the original author intended.
Tested by building nixos/tests/kexec.nix on x86_64-linux and while it
evaluates now, the test still fails by timing out shortly after the
kexec:
machine: waiting for the VM to finish booting
machine# Cannot find the ESP partition mount point.
This however seems to be an unrelated issue and was also the case before
the commit mentioned above.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edolstra, @dezgeg
* acquire DHCP on the interfaces with networking.interface.$name.useDHCP == true or on all interfaces if networking.useDHCP == true (was only only "eth0")
* respect "mtu" if it was in DHCP answer (it happens in the wild)
* acquire and set up staticroutes (unlike others clients, udhcpc does not do the query by default); this supersedes https://github.com/NixOS/nixpkgs/pull/41829
Although double '/' in paths is not a problem for GRUB supplied with nixpkgs, sometimes NixOS's grub.conf read by external GRUB and there are versions of GRUB which fail
The background color option is self-explanatory.
The mode is either `normal` or `stretch`, they are as defined by GRUB,
where normal will put the image in the top-left corner of the menu, and
stretch is the default, where it stretches the image without
consideration for the aspect ratio.
* https://www.gnu.org/software/grub/manual/grub/grub.html#background_005fimage
This fixes an issue where setting both
`boot.loader.systemd-boot.editor` to `false` and
`boot.loader.systemd-boot.consoleMode` to any value would concatenate
the two configuration lines in the output, resulting in an invalid
`loader.conf`.
From reading the source I'm pretty sure it doesn't support multiple Yubikeys, hence
those options are useless.
Also, I'm pretty sure nobody actually uses this feature, because enabling it causes
extra utils' checks to fail (even before applying any patches of this branch).
As I don't have the hardware to test this, I'm too lazy to fix the utils, but
I did test that with extra utils checks commented out and Yubikey
enabled the resulting script still passes the syntax check.
Also reuse common cryptsetup invocation subexpressions.
- Passphrase reading is done via the shell now, not by cryptsetup.
This way the same passphrase can be reused between cryptsetup
invocations, which this module now tries to do by default (can be
disabled).
- Number of retries is now infinity, it makes no sense to make users
reboot when they fail to type in their passphrase.
Rather than special-casing the dns options in networkmanager.nix, use
the module system to let unbound and systemd-resolved contribute to
the newtorkmanager config.
find-libs is currently choking when it finds the dynamic linker
as a DT_NEEDED dependency (from glibc) and bails out like this
(as glibc doesn't have a RPATH):
Couldn't satisfy dependency ld-linux-x86-64.so.2
Actually the caller of find-libs ignores the exit status, so the issue
almost always goes unnoticed and happens to work by chance. But
additionally what happens is that indirect .so dependencies are
left out from the dependency closure calculation, which breaks
latest cryptsetup as libssl.so isn't found anymore.
F2FS is used on Raspberry Pi-like devices to enhance SD card performance. Allowing F2FS resizing would help in automatic deploying of SD card images without a Linux box to resize the file system offline.
This has been reported by @qknight in his Stack Overflow question:
https://stackoverflow.com/q/50678639
The correct way to override a single value would be to use something
like this:
systemd.services.nagios.serviceConfig.Restart = lib.mkForce "no";
However, this doesn't work because the check is applied for the attrsOf
type and thus the attribute values might still contain the attribute set
created by mkOverride.
The unitOption type however did already account for this, but at this
stage it's already too late.
So now the actual value is unpacked while checking the values of the
attribute set, which should allow us to override values in
serviceConfig.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edolstra, @qknight
When a package contains a directory in one of the systemd directories
(like flatpak does), it is symlinked into the *-units derivation.
Then later, the derivation will try to create the directory, which
will fail:
mkdir: cannot create directory '/nix/store/…-user-units/dbus.service.d': File exists
builder for '/nix/store/…-user-units.drv' failed with exit code 1
Closes: #33233
GRUB 2.0 supports png, jpeg and tga. This will use the image's suffix to
load the right module.
As jpeg module is named jpeg, jpg is renamed jpeg.
If the user uses wrong image suffix for an image, it wouldn't work anyway.
This will leave up to two additional left-over files in /boot/ if user switches
through all the supported file formats. The module already left the png
image if the user disabled the splash image.