nixpkgs/nixos/tests/systemd-confinement/default.nix

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

275 lines
9.4 KiB
Nix
Raw Normal View History

import ../make-test-python.nix {
name = "systemd-confinement";
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
2022-03-20 23:15:30 +00:00
nodes.machine = { pkgs, lib, ... }: let
testLib = pkgs.python3Packages.buildPythonPackage {
name = "confinement-testlib";
unpackPhase = ''
cat > setup.py <<EOF
from setuptools import setup
setup(name='confinement-testlib', py_modules=["checkperms"])
EOF
cp ${./checkperms.py} checkperms.py
'';
};
mkTest = name: testScript: pkgs.writers.writePython3 "${name}.py" {
libraries = [ pkgs.python3Packages.pytest testLib ];
} ''
# This runs our test script by using pytest's assertion rewriting, so
# that whenever we use "assert <something>", the actual values are
# printed rather than getting a generic AssertionError or the need to
# pass an explicit assertion error message.
import ast
from pathlib import Path
from _pytest.assertion.rewrite import rewrite_asserts
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
script = Path('${pkgs.writeText "${name}-main.py" ''
import errno, os, pytest, signal
from subprocess import run
from checkperms import Accessibility, assert_permissions
${testScript}
''}') # noqa
filename = str(script)
source = script.read_bytes()
tree = ast.parse(source, filename=filename)
rewrite_asserts(tree, source, filename)
exec(compile(tree, filename, 'exec', dont_inherit=True))
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
'';
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
mkTestStep = num: {
description,
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
testScript,
config ? {},
serviceName ? "test${toString num}",
rawUnit ? null,
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
}: {
systemd.packages = lib.optional (rawUnit != null) (pkgs.writeTextFile {
name = serviceName;
destination = "/etc/systemd/system/${serviceName}.service";
text = rawUnit;
});
systemd.services.${serviceName} = {
inherit description;
requiredBy = [ "multi-user.target" ];
confinement = (config.confinement or {}) // { enable = true; };
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
serviceConfig = (config.serviceConfig or {}) // {
ExecStart = mkTest serviceName testScript;
Type = "oneshot";
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
};
} // removeAttrs config [ "confinement" "serviceConfig" ];
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
};
parametrisedTests = lib.concatMap ({ user, privateTmp }: let
withTmp = if privateTmp then "with PrivateTmp" else "without PrivateTmp";
serviceConfig = if user == "static-user" then {
User = "chroot-testuser";
Group = "chroot-testgroup";
} else if user == "dynamic-user" then {
DynamicUser = true;
} else {};
in [
{ description = "${user}, chroot-only confinement ${withTmp}";
config = {
confinement.mode = "chroot-only";
# Only set if privateTmp is true to ensure that the default is false.
serviceConfig = serviceConfig // lib.optionalAttrs privateTmp {
PrivateTmp = true;
};
};
testScript = if user == "root" then ''
assert os.getuid() == 0
assert os.getgid() == 0
assert_permissions({
nixos/systemd-confinement: Make / read-only Our more thorough parametrised tests uncovered that with the changes for supporting DynamicUser, we now have the situation that for static users the root directory within the confined environment is now writable for the user in question. This is obviously not what we want and I'd consider that a regression. However while discussing this with @ju1m and my suggestion being to set TemporaryFileSystem to "/" (as we had previously), they had an even better idea[1]: > The goal is to deny write access to / to non-root users, > > * TemporaryFileSystem=/ gives us that through the ownership of / by > root (instead of the service's user inherited from > RuntimeDirectory=). > * ProtectSystem=strict gives us that by mounting / read-only (while > keeping its ownership to the service's user). > > To avoid the incompatibilities of TemporaryFileSystem=/ mentioned > above, I suggest to mount / read-only in all cases with > ReadOnlyPaths = [ "+/" ]: > > ... > > I guess this would require at least two changes to the current tests: > > 1. to no longer expect root to be able to write to some paths (like > /bin) (at least not without first remounting / in read-write > mode). > 2. to no longer expect non-root users to fail to write to certain > paths with a "permission denied" error code, but with a > "read-only file system" error code. I like the solution with ReadOnlyPaths even more because it further reduces the attack surface if the user is root. In chroot-only mode this is especially useful, since if there are no other bind-mounted paths involved in the unit configuration, the whole file system within the confined environment is read-only. [1]: https://github.com/NixOS/nixpkgs/pull/289593#discussion_r1586794215 Signed-off-by: aszlig <aszlig@nix.build>
2024-05-06 12:50:15 +00:00
'bin': Accessibility.READABLE,
'nix': Accessibility.READABLE,
'run': Accessibility.READABLE,
${lib.optionalString privateTmp "'tmp': Accessibility.STICKY,"}
nixos/systemd-confinement: Make / read-only Our more thorough parametrised tests uncovered that with the changes for supporting DynamicUser, we now have the situation that for static users the root directory within the confined environment is now writable for the user in question. This is obviously not what we want and I'd consider that a regression. However while discussing this with @ju1m and my suggestion being to set TemporaryFileSystem to "/" (as we had previously), they had an even better idea[1]: > The goal is to deny write access to / to non-root users, > > * TemporaryFileSystem=/ gives us that through the ownership of / by > root (instead of the service's user inherited from > RuntimeDirectory=). > * ProtectSystem=strict gives us that by mounting / read-only (while > keeping its ownership to the service's user). > > To avoid the incompatibilities of TemporaryFileSystem=/ mentioned > above, I suggest to mount / read-only in all cases with > ReadOnlyPaths = [ "+/" ]: > > ... > > I guess this would require at least two changes to the current tests: > > 1. to no longer expect root to be able to write to some paths (like > /bin) (at least not without first remounting / in read-write > mode). > 2. to no longer expect non-root users to fail to write to certain > paths with a "permission denied" error code, but with a > "read-only file system" error code. I like the solution with ReadOnlyPaths even more because it further reduces the attack surface if the user is root. In chroot-only mode this is especially useful, since if there are no other bind-mounted paths involved in the unit configuration, the whole file system within the confined environment is read-only. [1]: https://github.com/NixOS/nixpkgs/pull/289593#discussion_r1586794215 Signed-off-by: aszlig <aszlig@nix.build>
2024-05-06 12:50:15 +00:00
${lib.optionalString privateTmp "'var': Accessibility.READABLE,"}
${lib.optionalString privateTmp "'var/tmp': Accessibility.STICKY,"}
})
'' else ''
assert os.getuid() != 0
assert os.getgid() != 0
assert_permissions({
'bin': Accessibility.READABLE,
'nix': Accessibility.READABLE,
'run': Accessibility.READABLE,
${lib.optionalString privateTmp "'tmp': Accessibility.STICKY,"}
${lib.optionalString privateTmp "'var': Accessibility.READABLE,"}
${lib.optionalString privateTmp "'var/tmp': Accessibility.STICKY,"}
})
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
'';
}
{ description = "${user}, full APIVFS confinement ${withTmp}";
config = {
# Only set if privateTmp is false to ensure that the default is true.
serviceConfig = serviceConfig // lib.optionalAttrs (!privateTmp) {
PrivateTmp = false;
};
};
testScript = if user == "root" then ''
assert os.getuid() == 0
assert os.getgid() == 0
assert_permissions({
nixos/systemd-confinement: Make / read-only Our more thorough parametrised tests uncovered that with the changes for supporting DynamicUser, we now have the situation that for static users the root directory within the confined environment is now writable for the user in question. This is obviously not what we want and I'd consider that a regression. However while discussing this with @ju1m and my suggestion being to set TemporaryFileSystem to "/" (as we had previously), they had an even better idea[1]: > The goal is to deny write access to / to non-root users, > > * TemporaryFileSystem=/ gives us that through the ownership of / by > root (instead of the service's user inherited from > RuntimeDirectory=). > * ProtectSystem=strict gives us that by mounting / read-only (while > keeping its ownership to the service's user). > > To avoid the incompatibilities of TemporaryFileSystem=/ mentioned > above, I suggest to mount / read-only in all cases with > ReadOnlyPaths = [ "+/" ]: > > ... > > I guess this would require at least two changes to the current tests: > > 1. to no longer expect root to be able to write to some paths (like > /bin) (at least not without first remounting / in read-write > mode). > 2. to no longer expect non-root users to fail to write to certain > paths with a "permission denied" error code, but with a > "read-only file system" error code. I like the solution with ReadOnlyPaths even more because it further reduces the attack surface if the user is root. In chroot-only mode this is especially useful, since if there are no other bind-mounted paths involved in the unit configuration, the whole file system within the confined environment is read-only. [1]: https://github.com/NixOS/nixpkgs/pull/289593#discussion_r1586794215 Signed-off-by: aszlig <aszlig@nix.build>
2024-05-06 12:50:15 +00:00
'bin': Accessibility.READABLE,
'nix': Accessibility.READABLE,
${lib.optionalString privateTmp "'tmp': Accessibility.STICKY,"}
'run': Accessibility.WRITABLE,
'proc': Accessibility.SPECIAL,
'sys': Accessibility.SPECIAL,
'dev': Accessibility.WRITABLE,
nixos/systemd-confinement: Make / read-only Our more thorough parametrised tests uncovered that with the changes for supporting DynamicUser, we now have the situation that for static users the root directory within the confined environment is now writable for the user in question. This is obviously not what we want and I'd consider that a regression. However while discussing this with @ju1m and my suggestion being to set TemporaryFileSystem to "/" (as we had previously), they had an even better idea[1]: > The goal is to deny write access to / to non-root users, > > * TemporaryFileSystem=/ gives us that through the ownership of / by > root (instead of the service's user inherited from > RuntimeDirectory=). > * ProtectSystem=strict gives us that by mounting / read-only (while > keeping its ownership to the service's user). > > To avoid the incompatibilities of TemporaryFileSystem=/ mentioned > above, I suggest to mount / read-only in all cases with > ReadOnlyPaths = [ "+/" ]: > > ... > > I guess this would require at least two changes to the current tests: > > 1. to no longer expect root to be able to write to some paths (like > /bin) (at least not without first remounting / in read-write > mode). > 2. to no longer expect non-root users to fail to write to certain > paths with a "permission denied" error code, but with a > "read-only file system" error code. I like the solution with ReadOnlyPaths even more because it further reduces the attack surface if the user is root. In chroot-only mode this is especially useful, since if there are no other bind-mounted paths involved in the unit configuration, the whole file system within the confined environment is read-only. [1]: https://github.com/NixOS/nixpkgs/pull/289593#discussion_r1586794215 Signed-off-by: aszlig <aszlig@nix.build>
2024-05-06 12:50:15 +00:00
${lib.optionalString privateTmp "'var': Accessibility.READABLE,"}
${lib.optionalString privateTmp "'var/tmp': Accessibility.STICKY,"}
})
'' else ''
assert os.getuid() != 0
assert os.getgid() != 0
assert_permissions({
'bin': Accessibility.READABLE,
'nix': Accessibility.READABLE,
${lib.optionalString privateTmp "'tmp': Accessibility.STICKY,"}
'run': Accessibility.STICKY,
'proc': Accessibility.SPECIAL,
'sys': Accessibility.SPECIAL,
'dev': Accessibility.SPECIAL,
'dev/shm': Accessibility.STICKY,
'dev/mqueue': Accessibility.STICKY,
${lib.optionalString privateTmp "'var': Accessibility.READABLE,"}
${lib.optionalString privateTmp "'var/tmp': Accessibility.STICKY,"}
})
'';
}
2024-06-30 21:27:13 +00:00
]) (lib.cartesianProduct {
user = [ "root" "dynamic-user" "static-user" ];
privateTmp = [ true false ];
});
in {
imports = lib.imap1 mkTestStep (parametrisedTests ++ [
{ description = "existence of bind-mounted /etc";
config.serviceConfig.BindReadOnlyPaths = [ "/etc" ];
testScript = ''
assert Path('/etc/passwd').read_text()
'';
}
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
(let
symlink = pkgs.runCommand "symlink" {
target = pkgs.writeText "symlink-target" "got me";
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
} "ln -s \"$target\" \"$out\"";
in {
description = "check if symlinks are properly bind-mounted";
config.confinement.packages = lib.singleton symlink;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
testScript = ''
assert Path('${symlink}').read_text() == 'got me'
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
'';
})
{ description = "check if StateDirectory works";
config.serviceConfig.User = "chroot-testuser";
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
config.serviceConfig.Group = "chroot-testgroup";
config.serviceConfig.StateDirectory = "testme";
# We restart on purpose here since we want to check whether the state
# directory actually persists.
config.serviceConfig.Restart = "on-failure";
config.serviceConfig.RestartMode = "direct";
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
testScript = ''
assert not Path('/tmp/canary').exists()
Path('/tmp/canary').touch()
if (foo := Path('/var/lib/testme/foo')).exists():
assert Path('/var/lib/testme/foo').read_text() == 'works'
else:
Path('/var/lib/testme/foo').write_text('works')
print('<4>Exiting with failure to check persistence on restart.')
raise SystemExit(1)
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
'';
}
{ description = "check if /bin/sh works";
testScript = ''
assert Path('/bin/sh').exists()
result = run(
['/bin/sh', '-c', 'echo -n bar'],
capture_output=True,
check=True,
)
assert result.stdout == b'bar'
'';
}
{ description = "check if suppressing /bin/sh works";
config.confinement.binSh = null;
testScript = ''
assert not Path('/bin/sh').exists()
with pytest.raises(FileNotFoundError):
run(['/bin/sh', '-c', 'echo foo'])
'';
}
{ description = "check if we can set /bin/sh to something different";
config.confinement.binSh = "${pkgs.hello}/bin/hello";
testScript = ''
assert Path('/bin/sh').exists()
result = run(
['/bin/sh', '-g', 'foo'],
capture_output=True,
check=True,
)
assert result.stdout == b'foo\n'
'';
}
{ description = "check if only Exec* dependencies are included";
config.environment.FOOBAR = pkgs.writeText "foobar" "eek";
testScript = ''
with pytest.raises(FileNotFoundError):
Path(os.environ['FOOBAR']).read_text()
'';
}
{ description = "check if fullUnit includes all dependencies";
config.environment.FOOBAR = pkgs.writeText "foobar" "eek";
config.confinement.fullUnit = true;
testScript = ''
assert Path(os.environ['FOOBAR']).read_text() == 'eek'
'';
}
{ description = "check if shipped unit file still works";
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
config.confinement.mode = "chroot-only";
rawUnit = ''
[Service]
SystemCallFilter=~kill
SystemCallErrorNumber=ELOOP
'';
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
testScript = ''
with pytest.raises(OSError) as excinfo:
os.kill(os.getpid(), signal.SIGKILL)
assert excinfo.value.errno == errno.ELOOP
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
'';
}
]);
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
config.users.groups.chroot-testgroup = {};
config.users.users.chroot-testuser = {
isSystemUser = true;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
description = "Chroot Test User";
group = "chroot-testgroup";
};
};
testScript = ''
machine.wait_for_unit("multi-user.target")
'';
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
}