Adds a function to wait for a new QMP event with a model filter
so that you can expect specific type of events with specific payloads.
e.g. a guest-reset-induced shutdown event.
Building a python environment with python3Minimal requires hydra
to bootstrap pip and build all packages used in the environment
which would otherwise not be built. This reduces cache re-use and duplicates things.
Also common dependencies normally included in python itself
are not properly checked and can cause hard to debug errors
because everyone just assumes those modules are there.
Aliases exist for a reason. Sure it is nice to make sure that
some aliases aren't used within Nixpkgs, but this creates two problems
which are far worse than your failing to meet your neatness compulsions.
- Users encounter missing attributes, https://github.com/NixOS/nixpkgs/issues/264577
wasting their time, stalling their progress, and even occupying others
time that would be better spent on fixing *real* issues.
- Hydra doesn't treat evaluation errors seriously enough, with the
effect that actual relevant test failures are masked by evaluation
failures such as those caused by this no aliases business.
- We don't even have the infrastructure to get rid of aliases, because
all warnings in package attributes are disallowed by Nixpkgs CI
tooling, last I checked.
Before re-disabling this, make sure that
- An actually helpful deprecation process is in place.
- Aliases are still allowed when `nixos-lib.runTests` and
`pkgs.testers.runNixOSTest` are invoked by external projects.
For instance, `all-tests.nix` could provide such an
override (e.g. with `newScope`).
This is a fixup for c1ae82f448.
nix' `passAsFile` does not create empty files for variables that are
`null`.
This results in the following error for units that have no overrides or
content, but are, e.g. `wantedBy`:
`mv: cannot stat '': No such file or directory`.
Minimal reproducer:
`systemd.units.empty.wantedBy = [ "multi-user.target" ];`
This is often necessary when a unit is loaded in via `systemd.packages`.
From now on, we will aim to ensure that the test driver
gets tested by OfBorg using all our available tests.
This commit adds the driver timeout test to the driver.
For `testBuildFailure` and similar functions, we need a full blown derivation and not a lazy one.
This is an internal option for test framework developers.
Since the debut of the test-driver, we didn't obtain
a race timer with the test execution to ensure that tests doesn't run beyond
a certain amount of time.
This is particularly important when you are running into hanging tests
which cannot be detected by current facilities (requires more pvpanic wiring up, QMP
API stuff, etc.).
Two easy examples:
- Some QEMU tests may get stuck in some situation and run for more than 24 hours → we default to 1 hour max.
- Some QEMU tests may panic in the wrong place, e.g. UEFI firmware or worse → end users can set a "reasonable" amount of time
And then, we should let the retry logic retest them until they succeed and adjust
their global timeouts.
Of course, this does not help with the fact that the timeout may need to be
a function of the actual busyness of the machine running the tests.
This is only one step towards increased reliability.
Now that we have a QMP client, we can wire it up in the test driver.
For now, it is almost completely useless because of the need of a constant "event loop", especially
for event listening.
In the next commits, we will slowly enable more and more usecases.
For some time now the attrset returned by `evalModules` has
`type = "configuration"`.
This is a clean refactor because the name is not exposed.
(never is for simple lambda)
When listening on unix sockets, it doesn't make sense to specify a port
for nginx's listen directive.
Since nginx defaults to port 80 when the port isn't specified (but the
address is), we can change the default for the option to null as well
without changing any behaviour.
This also makes configuration available if you just run those tools locally.
Also use ruff instead of pylint because it's faster and more
comprehensive.
Since 008f9f0cd4
("nixos/test-driver: actually use the backdoor message to wait for backdoor"),
when boot is still computering, we can get a tons of empty strings in response to the shell.
This is not really useful to print and waste the disk space for any CI system that logs them.
We stop logging chunks whenever they are empty.
While working on #192270, I noticed that only some wait_for_* helper
functions make the timeout configurable. I think we should be able to
customize it in all cases
https://git.sr.ht/~c00w/nixpkgs/tree/sdimagebtrfs/item/nixos/lib/make-btrfs-fs.nix
I made only one change which was to use `btrfs check` instead of
`fsck.btrfs` because of this warning
```
btrfs-fs.img> ++ fsck.btrfs /nix/store/6d46rc768c140asy6rjpc5rk568r36zq-btrfs-fs.img
btrfs-fs.img> If you wish to check the consistency of a BTRFS filesystem or
btrfs-fs.img> repair a damaged filesystem, see btrfs(8) subcommand 'check'.
```
Co-authored-by: Colin L Rice <colin@daedrum.net>
This commit is a fixup for a regression introduced by
0bdba6c99b.
Before the regression, it was possible to build images without grub or a
kernel (e.g. to boot other kernels with qemu -kernel.
After the regression, such images fail to build. Since
config.boog.loader.grub.enable is false in that scenario, grub.device is
emptystring. While this happens not to be an issue of `ln`, `dirname`
fails on emptystring.
With this change, we guard both commands to only be run when grub is
actually enabled. Images with and without grub succesfully build with
this change.
New EDK2 sets up the backdoor port as a serial console, which feeds the test driver
a bunch of boot logs it can safely ignore. Do so by waiting for the message the
backdoor shell prints before doing anything else.
According to systemd.netdev manpage:
```
MACAddress=
Specifies the MAC address to use for the device, or takes the special value "none". When "none", systemd-networkd does not request the MAC address for
the device, and the kernel will assign a random MAC address. For "tun", "tap", or "l2tp" devices, the MACAddress= setting in the [NetDev] section is
not supported and will be ignored. Please specify it in the [Link] section of the corresponding systemd.network(5) file. If this option is not set,
"vlan" device inherits the MAC address of the master interface. For other kind of netdevs, if this option is not set, then the MAC address is
generated based on the interface name and the machine-id(5).
Note, even if "none" is specified, systemd-udevd will assign the persistent MAC address for the device, as 99-default.link has
MACAddressPolicy=persistent. So, it is also necessary to create a custom .link file for the device, if the MAC address assignment is not desired.
```
Therefore, `none` is an acceptable value.
When lib overrides were used, before this commit, they would not be made
available in the configuration evaluation of nixosTest's nodes.
Sample code:
``` nix
let
pkgs = import ./. {
overlays = [
(new: old: {
lib = old.lib.extend (self: super: {
sorry_dave = builtins.trace "There are no pod bay doors" "sorry dave";
});
})
];
};
in
pkgs.testers.nixosTest {
name = "demo lib overlay";
nodes = {
machine = { lib, ... }: {
environment.etc."got-lib-overlay".text = lib.sorry_dave;
};
};
testScript = { nodes }:
''
start_all()
machine.succeed('grep dave /etc/got-lib-overlay')
'';
}
```
By some miracle, before, it was possible to reconnect to the `node1` without
doing any relevant dance.
But now we are direct booting (¿), it seems like we need to do the right things.
This introduces a `check_output` flag for `execute` because we do not want to steal the
messages from the backdoor service as we might execute the kexec too fast compared
to when we will reconnect.
Therefore, we will let the message in the pipe if needed.
This change removes the bespoke logic around identifying block devices.
Instead of trying to find the right device by iterating over
`qemu.drives` and guessing the right partition number (e.g.
/dev/vda{1,2}), devices are now identified by persistent names provided
by udev in /dev/disk/by-*.
Before this change, the root device was formatted on demand in the
initrd. However, this makes it impossible to use filesystem identifiers
to identify devices. Now, the formatting step is performed before the VM
is started. Because some tests, however, rely on this behaviour, a
utility function to replace this behaviour in added in
/nixos/tests/common/auto-format-root-device.nix.
Devices that contain neither a partition table nor a filesystem are
identified by their hardware serial number which is injecetd via QEMU
(and is thus persistent and predictable). PCI paths are not a reliably
way to identify devices because their availability and numbering depends
on the QEMU machine type.
This change makes the module more robust against changes in QEMU and the
kernel (non-persistent device naming) and by decoupling abstractions
(i.e. rootDevice, bootPartition, and bootLoaderDevice) enables further
improvement down the line.
they're no longer necessary for us and will almost definitely start to
rot now (like commonmark and asciidoc outputs did previously). most
existing users seem to take the docbook output and run it through pandoc
to generate html, those can easily migrate to use commonmark instead.
other users will hopefully pipe up when they notice that things they rely
on are going away.
optionsUsedDocbook has only been around for one release and only exposed
to allow other places to generate warnings, so that does not deserve
such precautions.
with everything being rendered from markdown now we no longer need to
postprocess any options.xml that may be requested from elsewhere. we'll
don't need to keep the module path check either since that's done by
optionsJSON now.
docbook is now gone and we can flip the defaults. we won't keep the
command line args around (unlike the make-options-docs argument) because
nixos-render-docs should not be considered an exposed API.
with docbook no longer supported we can default to markdown option docs.
we'll keep the parameter around for a bit to not break external users
who set it to true. we don't know of any users that do, so the
deprecation period may be rather short for this one.
it's been long in the making, and with 23.05 out we can finally disable
docbook option docs and default to markdown instead. this brings a
massive speed boost in manual and manpage builds, so much so that we may
consider enabling user module documentation by default.
we don't remove the docbook support code entirely yet because it's a lot
all over, and probably better removed in multiple separate changes.
- `wait_until_fails` was not passing through its `timeout` argument to
the internal `retry` function, hence was always using 900 seconds (the
default timeout for `retry`) rather than the user-specified value.
Previously, `wait_for_console_text` would block indefinitely until there were lines
shown in the buffer.
This is highly annoying when testing for things that can just hang for some reasons.
This introduces a classical timeout mechanism via non-blocking get on the Queue.
This is useful whenever you want to diagnose the current state of UEFI
variables, to assert that bootloaders or boot programs (systemd-stub)
did their job correctly and set their variables accordingly.
In the future, it can enable inspecting SecureBoot keys also.
This warning was added a year and a half ago, but still no test in
NixOS directly instantiates the machine class, presumably because it's
not actually possible for a test to do so without losing
functionality. For example, there's no way for a NixOS test to access
the output directory that create_machine passes to the Machine
constructor.
This warning is therefore just contributing to alert fatigue for
users, who are unable to follow its advice. Once it's actually
possible to do what it suggests, the warning can be reintroduced.
Adds a new option to the virtualisation modules that enables specifying explicitly named network interfaces in QEMU VMs.
The existing `virtualisation.vlans` option is still supported for cases where the name of the network interface is irrelevant.
By adding this option indirection, a test can declare all by itself
that it needs a custom nixpkgs. This is a more convenient way of
going about this when the caller of the test framework receives a
`node.pkgs` unconditionally.