Since the debut of the test-driver, we didn't obtain
a race timer with the test execution to ensure that tests doesn't run beyond
a certain amount of time.
This is particularly important when you are running into hanging tests
which cannot be detected by current facilities (requires more pvpanic wiring up, QMP
API stuff, etc.).
Two easy examples:
- Some QEMU tests may get stuck in some situation and run for more than 24 hours → we default to 1 hour max.
- Some QEMU tests may panic in the wrong place, e.g. UEFI firmware or worse → end users can set a "reasonable" amount of time
And then, we should let the retry logic retest them until they succeed and adjust
their global timeouts.
Of course, this does not help with the fact that the timeout may need to be
a function of the actual busyness of the machine running the tests.
This is only one step towards increased reliability.
Now that we have a QMP client, we can wire it up in the test driver.
For now, it is almost completely useless because of the need of a constant "event loop", especially
for event listening.
In the next commits, we will slowly enable more and more usecases.
For some time now the attrset returned by `evalModules` has
`type = "configuration"`.
This is a clean refactor because the name is not exposed.
(never is for simple lambda)
When listening on unix sockets, it doesn't make sense to specify a port
for nginx's listen directive.
Since nginx defaults to port 80 when the port isn't specified (but the
address is), we can change the default for the option to null as well
without changing any behaviour.
This also makes configuration available if you just run those tools locally.
Also use ruff instead of pylint because it's faster and more
comprehensive.
Since 008f9f0cd4
("nixos/test-driver: actually use the backdoor message to wait for backdoor"),
when boot is still computering, we can get a tons of empty strings in response to the shell.
This is not really useful to print and waste the disk space for any CI system that logs them.
We stop logging chunks whenever they are empty.
While working on #192270, I noticed that only some wait_for_* helper
functions make the timeout configurable. I think we should be able to
customize it in all cases
https://git.sr.ht/~c00w/nixpkgs/tree/sdimagebtrfs/item/nixos/lib/make-btrfs-fs.nix
I made only one change which was to use `btrfs check` instead of
`fsck.btrfs` because of this warning
```
btrfs-fs.img> ++ fsck.btrfs /nix/store/6d46rc768c140asy6rjpc5rk568r36zq-btrfs-fs.img
btrfs-fs.img> If you wish to check the consistency of a BTRFS filesystem or
btrfs-fs.img> repair a damaged filesystem, see btrfs(8) subcommand 'check'.
```
Co-authored-by: Colin L Rice <colin@daedrum.net>
This commit is a fixup for a regression introduced by
0bdba6c99b.
Before the regression, it was possible to build images without grub or a
kernel (e.g. to boot other kernels with qemu -kernel.
After the regression, such images fail to build. Since
config.boog.loader.grub.enable is false in that scenario, grub.device is
emptystring. While this happens not to be an issue of `ln`, `dirname`
fails on emptystring.
With this change, we guard both commands to only be run when grub is
actually enabled. Images with and without grub succesfully build with
this change.
New EDK2 sets up the backdoor port as a serial console, which feeds the test driver
a bunch of boot logs it can safely ignore. Do so by waiting for the message the
backdoor shell prints before doing anything else.
According to systemd.netdev manpage:
```
MACAddress=
Specifies the MAC address to use for the device, or takes the special value "none". When "none", systemd-networkd does not request the MAC address for
the device, and the kernel will assign a random MAC address. For "tun", "tap", or "l2tp" devices, the MACAddress= setting in the [NetDev] section is
not supported and will be ignored. Please specify it in the [Link] section of the corresponding systemd.network(5) file. If this option is not set,
"vlan" device inherits the MAC address of the master interface. For other kind of netdevs, if this option is not set, then the MAC address is
generated based on the interface name and the machine-id(5).
Note, even if "none" is specified, systemd-udevd will assign the persistent MAC address for the device, as 99-default.link has
MACAddressPolicy=persistent. So, it is also necessary to create a custom .link file for the device, if the MAC address assignment is not desired.
```
Therefore, `none` is an acceptable value.
When lib overrides were used, before this commit, they would not be made
available in the configuration evaluation of nixosTest's nodes.
Sample code:
``` nix
let
pkgs = import ./. {
overlays = [
(new: old: {
lib = old.lib.extend (self: super: {
sorry_dave = builtins.trace "There are no pod bay doors" "sorry dave";
});
})
];
};
in
pkgs.testers.nixosTest {
name = "demo lib overlay";
nodes = {
machine = { lib, ... }: {
environment.etc."got-lib-overlay".text = lib.sorry_dave;
};
};
testScript = { nodes }:
''
start_all()
machine.succeed('grep dave /etc/got-lib-overlay')
'';
}
```
By some miracle, before, it was possible to reconnect to the `node1` without
doing any relevant dance.
But now we are direct booting (¿), it seems like we need to do the right things.
This introduces a `check_output` flag for `execute` because we do not want to steal the
messages from the backdoor service as we might execute the kexec too fast compared
to when we will reconnect.
Therefore, we will let the message in the pipe if needed.
This change removes the bespoke logic around identifying block devices.
Instead of trying to find the right device by iterating over
`qemu.drives` and guessing the right partition number (e.g.
/dev/vda{1,2}), devices are now identified by persistent names provided
by udev in /dev/disk/by-*.
Before this change, the root device was formatted on demand in the
initrd. However, this makes it impossible to use filesystem identifiers
to identify devices. Now, the formatting step is performed before the VM
is started. Because some tests, however, rely on this behaviour, a
utility function to replace this behaviour in added in
/nixos/tests/common/auto-format-root-device.nix.
Devices that contain neither a partition table nor a filesystem are
identified by their hardware serial number which is injecetd via QEMU
(and is thus persistent and predictable). PCI paths are not a reliably
way to identify devices because their availability and numbering depends
on the QEMU machine type.
This change makes the module more robust against changes in QEMU and the
kernel (non-persistent device naming) and by decoupling abstractions
(i.e. rootDevice, bootPartition, and bootLoaderDevice) enables further
improvement down the line.
they're no longer necessary for us and will almost definitely start to
rot now (like commonmark and asciidoc outputs did previously). most
existing users seem to take the docbook output and run it through pandoc
to generate html, those can easily migrate to use commonmark instead.
other users will hopefully pipe up when they notice that things they rely
on are going away.
optionsUsedDocbook has only been around for one release and only exposed
to allow other places to generate warnings, so that does not deserve
such precautions.
with everything being rendered from markdown now we no longer need to
postprocess any options.xml that may be requested from elsewhere. we'll
don't need to keep the module path check either since that's done by
optionsJSON now.
docbook is now gone and we can flip the defaults. we won't keep the
command line args around (unlike the make-options-docs argument) because
nixos-render-docs should not be considered an exposed API.
with docbook no longer supported we can default to markdown option docs.
we'll keep the parameter around for a bit to not break external users
who set it to true. we don't know of any users that do, so the
deprecation period may be rather short for this one.
it's been long in the making, and with 23.05 out we can finally disable
docbook option docs and default to markdown instead. this brings a
massive speed boost in manual and manpage builds, so much so that we may
consider enabling user module documentation by default.
we don't remove the docbook support code entirely yet because it's a lot
all over, and probably better removed in multiple separate changes.
- `wait_until_fails` was not passing through its `timeout` argument to
the internal `retry` function, hence was always using 900 seconds (the
default timeout for `retry`) rather than the user-specified value.
Previously, `wait_for_console_text` would block indefinitely until there were lines
shown in the buffer.
This is highly annoying when testing for things that can just hang for some reasons.
This introduces a classical timeout mechanism via non-blocking get on the Queue.
This is useful whenever you want to diagnose the current state of UEFI
variables, to assert that bootloaders or boot programs (systemd-stub)
did their job correctly and set their variables accordingly.
In the future, it can enable inspecting SecureBoot keys also.
This warning was added a year and a half ago, but still no test in
NixOS directly instantiates the machine class, presumably because it's
not actually possible for a test to do so without losing
functionality. For example, there's no way for a NixOS test to access
the output directory that create_machine passes to the Machine
constructor.
This warning is therefore just contributing to alert fatigue for
users, who are unable to follow its advice. Once it's actually
possible to do what it suggests, the warning can be reintroduced.
Adds a new option to the virtualisation modules that enables specifying explicitly named network interfaces in QEMU VMs.
The existing `virtualisation.vlans` option is still supported for cases where the name of the network interface is irrelevant.
By adding this option indirection, a test can declare all by itself
that it needs a custom nixpkgs. This is a more convenient way of
going about this when the caller of the test framework receives a
`node.pkgs` unconditionally.
mkIf is unnecessary when the condition is statically known - that is
knowable before entering the module evaluation.
By changing this to a precomputed module, we support changing the
defined options to readOnly options.
This change is made for two reasons:
1. If `toString config.restartTriggers` containes `\n`, systemd unit
file will be ill-formed.
2. This change can limit length of the trigger, although it doesn't
matter in most cases.
This allows modules that declare their class to be checked.
While that's not most user modules, frameworks can take advantage
of this by setting declaring the module class for their users.
That way, the mistake of importing a module into the wrong hierarchy
can be reported more clearly in some cases.
`make-disk-image` is a tool for creating VM images. It takes an argument
`contents` that allows one to specify files and directories that should
be copied into the VM image. However, directories end up not at the
specified target, but instead at a subdirectory of the target, with a
nix-store-like path, e.g.
`/target/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-source`. See issue
https://github.com/NixOS/nixpkgs/issues/226203 .
This change adds a test for make-disk-image's contents directory
handling and adds a fix (appending `/` to rsync input directory names).
This closes issue https://github.com/NixOS/nixpkgs/issues/226203 .
What the code was trying to do was helpfully add a directory and
extension if none were specified, but it did this by checking whether
the filename was composed of a very limited character set that didn't
even include dashes.
With this change, the intention of the code is clearer, and I can put
dashes in my screenshot names.
the old method of pasting parts of options.json into a markdown document
and hoping for the best no longer works now that options.json contains
more than just docbook. given the infrastructure we have now we can
actually render options.md properly, so we may as well do that.
options processing is pretty slow right now, mostly because the
markdown-it-py parser is pure python (and with performance
pessimizations at that). options parsing *is* embarassingly parallel
though, so we can just fork out all the work to worker processes and
collect the results.
multiprocessing probably has a greater benefit on linux than on darwin
since the worker spawning method darwin uses is less efficient than
fork() on linux. this hasn't been tested on darwin, only on linux, but
if anything darwin will be faster with its preferred method.
The output of a command is not guaranteed to be valid UTF-8, so the
decoding can fail raising UnicodeDecodeError. If this happens during a
`succeeds` the check will be erroneously marked failed.
This changes the error handling to the "replace" mode, where invalid
codepoints are replaced with � (REPLACEMENT CHARACTER U+FFFD) and the
decoding can go on.
...for explicitly named network interfaces
This reverts commit 6ae3e7695e.
(and evaluation fixups 08d26bbb727aed90a969)
Some of the tests fail or time out after the merge.
using environment variables isn't great once multiple input or output
formats get involved (which will happen soon). now is a good time to set
a pattern for future converters.
this new package shall eventually contain the rendering code necessary
to produce the entirety of the nixos (not nixpkgs) manual, in all of its
various output formats.
`shell_interact()` is currently not nice to use. If you try to cancel
the socat process, it will also break the nixos test. Furthermore
ptpython creates it's own terminal that subprocesses are running in,
which breaks some of the terminal features of socat.
Hence this commit extends `shell_interact` to allow also to connect to
arbitrary servers i.e. tcp servers started by socat.
the rest of the nixos manual has them enabled, so we should enable them
here too for consistency.
this changes rendered output pervasively. changes also include quotes in
types (eg in `strings concatenated with "\n"`), but since those are not
code this is probably fine. if not we can probably add a myst role to
inhibit replacements.
the rules are fixed, and we want to support all of them (or throw a
useful error message). this will also become the base for a generic
renderer system, so let's just list all the rules statically.
this restores mergeJSON to its former glory if…merging json, and
extracts the MD rendering into a new script that will run instead of the
py+nix+xslt pipeline we previously ran to convert options.json to docbook.
this change alone gives a noticable performance boost when building
docs (18s instead of 27s to build optionsDocBook).
no changes to rendered output, except for a single example in the
rsnapshot module that uses hard tabs for indentation instead of spaces.
this probably isn't important.
docbook warnings remain with mergeJSON since the other processing steps
output single files instead of directories. since we'll only keep the
check until 23.11 this is probably also not important to fix.
also contains a few improvements to error reporting in the MD renderers.
Adds a new option to the virtualisation modules that enables specifying
explicitly named network interfaces in QEMU VMs. The existing
`virtualisation.vlans` is still supported for cases where the name of
the network interface is irrelevant.
only whitespace changes (mostly empty descriptions rendered as literal
line breaks and trailing space toPretty generates, but that were dropped
by mistune).
don't generate docbook for related packages, generate markdown instead.
this could be extended further to not even generate markdown but have
mergeJSON handle all of the rendering. markdown will work fine for now
though.
only whitespace changes to rendered outputs, all in the vicinity or body
of admonitions. previously admonitions would not receive paragraph
breaks even when they should have because the description postprocessing
did not match on their contents.