NOTES:
@jakeschurch did not realize that is was already updated on master, but not
backported to 23.05 channel
Signed-off-by: Jake Schurch <jakeschurch@gmail.com>
We were setting the systemd pre-start script through the
systemd.services.<name>.preStart NixOS option. This option uses a
string containing the pre-start script as input.
In some scenarios, you want to extend this script to perform some
additional actions before launching a container.
At the moment, your only option is to mkForce the pre-start string and
rewrite a preStart script from scratch. Potentially vendoring the
Nixpkgs pre-start script in your custom pre-start script. (you can
also create a new service unit in charge of running the custom
pre-start and create a dependency link between the units, but that's
also sub-optimal).
The systemd.services.<name>.serviceConfig.ExecStartPre NixOS option
gives us a better way to extend a pre-start script. Instead of being a
simple script, this option can be a list of scripts. The NixOS module
system then merges the multiple list declarations instead of
overriding them. Meaning that if we use this ExecStartPre option, we
can trivially extend the exec-start script: just add the custom script
in the systemd service override and you're done.
ExecStartPre behaves a tiny bit differently from preStart. Instead of
expecting a string containing a script, it expects a path pointing to
a script. We take advantage of this API change to check the pre-start
script with shellCheck via the pkgs.writeShellApplication function.
This change removes the bespoke logic around identifying block devices.
Instead of trying to find the right device by iterating over
`qemu.drives` and guessing the right partition number (e.g.
/dev/vda{1,2}), devices are now identified by persistent names provided
by udev in /dev/disk/by-*.
Before this change, the root device was formatted on demand in the
initrd. However, this makes it impossible to use filesystem identifiers
to identify devices. Now, the formatting step is performed before the VM
is started. Because some tests, however, rely on this behaviour, a
utility function to replace this behaviour in added in
/nixos/tests/common/auto-format-root-device.nix.
Devices that contain neither a partition table nor a filesystem are
identified by their hardware serial number which is injecetd via QEMU
(and is thus persistent and predictable). PCI paths are not a reliably
way to identify devices because their availability and numbering depends
on the QEMU machine type.
This change makes the module more robust against changes in QEMU and the
kernel (non-persistent device naming) and by decoupling abstractions
(i.e. rootDevice, bootPartition, and bootLoaderDevice) enables further
improvement down the line.
As with many things, we have scenarios where we don't want to boot on a
disk / bootloader and also we don't want to boot directly.
Sometimes, we want to boot through an OptionROM of our NIC, e.g. netboot
scenarios or let the firmware decide something, e.g. UEFI PXE (or even
UEFI OptionROM!).
This is composed of:
- `directBoot.enable`: whether to direct boot or not
- `directBoot.initrd`: enable overriding the
`config.system.build.initialRamdisk` defaults, useful for
netbootRamdisk for example.
This makes it possible.
Adds a new option to the virtualisation modules that enables specifying explicitly named network interfaces in QEMU VMs.
The existing `virtualisation.vlans` option is still supported for cases where the name of the network interface is irrelevant.
Libvirt support calling user defined hooks on certains events.
Documentation can be found https://libvirt.org/hooks.html.
This commit allow specifying these hooks via the
virtualisation.libvirtd.hooks.<name>.* options
Calling `eval-config.nix` without a `system` from a Nix flake fails with
`error: attribute 'currentSystem' missing` since #230523. Setting
`system = null` removes the use of `currentSystem` and instead uses the
value from the `nixpkgs` module.
Context summary:
'vma create' can't otherwise write to tmpfs such as /dev/shm.
This is important when used from non-nixos machines which may
have /build as tmpfs.
VMA is Proxmox's virtual machine image format that wraps QEMU images,
augmenting these with proxmox-specific configuration file.
proxmox-image.nix uses the VMA tool to create vma image files.
The VMA tool exists as a patchset ontop of QEMU.
VMA writes its output with open() and O_DIRECT flag.
O_DIRECT does not work on Linux tmpfs [1]. Thus:
$ vma create ~/output.vma ... # works, assuming home isn't tmpfs.
$ vma create /dev/shm/output.vma ... # fails since /dev/shm is tmpfs
Failure results in assert(*errp == NULL).
O_DIRECT is a cache performance hint.
But it currently blocks our usage of nixos-generate -f proxmox from
Non-NixOS hosts and Docker.
The patch here simply removes O_DIRECT:
vma-writer.c later performs memalign due to O_DIRECT, but this is
safe to do with or without O_DIRECT.
Ideally, this should be fixed in upstream Proxmox: Perhaps by falling
back to open without O_DIRECT.
Another attempt to fix this SIGABRT is [2], which writes the vma file
directory to $out/ folder -- however that may still be tmpfs mounted
which it is in our case.
[1] https://lore.kernel.org/lkml/45A29EC2.8020502@tmr.com/t/
[2] https://github.com/NixOS/nixpkgs/pull/224282
`useEFIBoot` is somewhat misleading, but we should make it possible to
enable UEFI environment / firmware without buying into a bootloader.
This makes it possible.
Previously, it was possible to run with a tmpfs / with
`virtualisation.diskImage = null;`, this was likely broken by my changes
in 4b4e4c3ef9.
It is reintroduced by disabling properly the bootloader for now, as it
is complicated to make it work with.
Now that `useBootLoader` produces a full system image, moving disk
images can be slow because they have a full Nix store in them.
It does not make sense to keep the 9p mountpoint to shadow the
/nix/store of the VM.
We disable it if we have `useBootLoader` and introduce an option for
easy overrides.
This option has been introduced in 678eed323f without realizing there was this
PR inflight, unfortunately, it collide with what this PR does and make
it irrelevant.
Therefore, I remove it here.
trying to get all of the podman functionality to work with the wrapper
is becoming more complicated with each release, it isn't sustainable
removing the wrapper does mean that using extraPackages will need to build from source
- remove unnecessary serviceConfig overrides
- set HELPER_BINARIES_DIR to libexec/podman
- use install.bin target on linux for podman/tmpfiles
- also installs quadlet/rootlessport in libexec
- symlink binaries from helpersBin into HELPER_BINARIES_DIR
- remove unnecessary rootlessport output
- remove unnecessary substituteInPlace
The typo creates an empty directory 0755 in initrd rootfs rather than
create the Nix store directories with mode 0755.
I guess setting the mode is not strictly necessary if it worked before
this change, but I'll leave the `-m 0755` in just in case.
trying to get all of the podman functionality to work with the wrapper
is becoming more complicated with each release, it isn't sustainable
removing the wrapper does mean that using extraPackages will need to build from source
- include pkgs.zfs by default in the wrapped podman used by the module so it is cached
- anyone using zfsUnstable will need to build from source
- remove unnecessary serviceConfig overrides
- set HELPER_BINARIES_DIR during build
- use install.bin target on linux for podman/tmpfiles
- also installs quadlet/rootlessport in libexec
- remove unnecessary rootlessport output
- remove unnecessary substituteInPlace
This is because vSphere version 6.7.0.51000 errors with
Issues detected with selected template. Details: -
78:7:VALUE_ILLEGAL: Value ''3'' of Parent element does not refer
to a ref of type DiskControllerReference.
when using SATA.
...for explicitly named network interfaces
This reverts commit 6ae3e7695e.
(and evaluation fixups 08d26bbb727aed90a969)
Some of the tests fail or time out after the merge.
Adds a new option to the virtualisation modules that enables specifying
explicitly named network interfaces in QEMU VMs. The existing
`virtualisation.vlans` is still supported for cases where the name of
the network interface is irrelevant.
Previously, secrets were named according to the initrd they were
associated with. This created a problem: If secrets were changed whilst
the initrd remained the same, there were two versions of the secrets
with one initrd. The result was that only one version of the secrets would
by recorded into the /boot partition and get used. AFAICT this would
only be the oldest version of the secrets for the given initrd version.
This manifests as #114594, which I found frustrating while trying to use
initrd secrets for the first time. While developing the secrets I found
I could not get new versions of the secrets to take effect.
Additionally, it's a nasty issue to run into if you had cause to change
the initrd secrets for credential rotation, etc, if you change them and
discover you cannot, or alternatively that you can't roll back as you
would expect.
Additional changes in this patch.
* Add a regression test that switching to another grub configuration
with the alternate secrets works. This test relies on the fact that it
is not changing the initrd. I have checked that the test fails if I
undo my change.
* Persist the useBootLoader disk state, similarly to other boot state.
* I had to do this, otherwise I could not find a route to testing the
alternate boot configuration. I did attempt a few different ways of
testing this, including directly running install-grub.pl, but what
I've settled on is most like what a user would do and avoids
depending on lots of internal details.
* Making tests that test the boot are a bit tricky (see hibernate.nix
and installer.nix for inspiration), I found that in addition to
having to copy quite a bit of code I still couldn't get things to
work as desired since the bootloader state was being clobbered.
My change to persist the useBootLoader state could break things,
conceptually. I need some help here discovering if that is the case,
possibly by letting this run through a staging CI if there is one.
Fix#114594.
cc potential reviewers:
@lopsided98 (original implementer) @joachifm (original reviewer),
@wkennington (numerous fixes to grub-install.pl), @lheckemann (wrote
original secrets test).
The aarch64-linux kernel and initrd recently eclipsed 60M, causing the
boot disk image build to run out of space and fail. Double the size of
the image to 120M to fix the issue.
The disk image is stored in expandable qcow2 format, so only the space
actually used by files in the image is consumed. Therefore, other
architectures are not unfairly penalized, and the output size does not
suddenly double.
This also fixes NixOS tests which use this option, like systemd-boot's.
The agent has not been updated for a very long time. In addition to
updating to the newest tagged version the change creates a package for
it.
The existing version has issues with the new python2.7 package not
containing crypt.so file. And the commit
6910a4eea0 I believe introduced
regression that caused the shebang to not be updated.
This adds a new ``parallelShutdown`` option that allows users to control
how many guests can be shut down concurrently. Allowing multiple virtual
machines to be shut down at the same time reduces the amount of time it
takes to reboot the host.
Upstream documentation: https://www.libvirt.org/manpages/libvirt-guests.html#files
This fixes `lxd init`, which previously failed like this:
$ yes "" | lxd init
[...]
Error: Failed to create storage pool "default": Failed to run: losetup --find --nooverlap --direct-io=on --show /var/lib/lxd/disks/default.img: exec: "losetup": executable file not found in $PATH
Add a section on ordering option definitions.
Also mention `mkDefault` in the section on `mkOverride`.
Clarify the code a bit by renaming `defaultPriority` to
`defaultOverridePriority` and introducing `defaultOrderPriority`.
We don't need both wget and curl, so let's use only curl (which is
part of a minimal NixOS closure, unlike wget).
Logging to the console is helpful for debugging.
Instances without SSH keys configured will receive a 404 from the
metadata server when attempting to fetch an SSH key. This is not an
actual problem though, and shouldn't result in the service failing.
If the metadata server cannot be reached, the script will fail at an
earlier stage when attempting to get authentication data.
This also removes automatic enablement/mounting of instance store swap
devices and ext3 filesystems. This behaviour is strongly opinionated
and shouldn't be enabled by default.
The unionfs behaviour never took effect anyway, because the AMI
manifest path only exists for instance store-backed AMIs, which have
not been supported by nixpkgs since
84742e2293 (2019).
Previously we did socket-activation but this breaks the autostart
feature since upstream expects libvirtd to be started unconditionally on
boot.
Fixes#171623.
Allow building other than Legacy-BIOS-only Proxmox images.
Default is unchanged.
To build UEFI proxmox image use:
proxmox.qemuConf.bios = "ovmf";
(default is "seabios")
To build image bootable using both "seabios" and "ovmf" use:
partitionTableType = "hybrid";
BIOS can be switched in Proxmox between "seabios" and "ovmf" and VM still boots.
(GRUB2-only, systemd-boot does not boot under "seabios")
To build systemd-boot UEFI image:
proxmox.qemuConf.bios = "ovmf";
boot.loader.systemd-boot.enable = true;
This adds an option to the qemu virtualisation module to isolate the
guest's from the host's and outside networks.
This is particularly useful for development sandboxes for example.
The option is disabled by default to preserve the current behaviour.
Use hostPlatform if both the host and the containers nixpkgs supports
hostPlatform, otherwise fall back to localSystem. This preseves backwards
compatibility.