By some miracle, before, it was possible to reconnect to the `node1` without
doing any relevant dance.
But now we are direct booting (¿), it seems like we need to do the right things.
This introduces a `check_output` flag for `execute` because we do not want to steal the
messages from the backdoor service as we might execute the kexec too fast compared
to when we will reconnect.
Therefore, we will let the message in the pipe if needed.
- `wait_until_fails` was not passing through its `timeout` argument to
the internal `retry` function, hence was always using 900 seconds (the
default timeout for `retry`) rather than the user-specified value.
Previously, `wait_for_console_text` would block indefinitely until there were lines
shown in the buffer.
This is highly annoying when testing for things that can just hang for some reasons.
This introduces a classical timeout mechanism via non-blocking get on the Queue.
This is useful whenever you want to diagnose the current state of UEFI
variables, to assert that bootloaders or boot programs (systemd-stub)
did their job correctly and set their variables accordingly.
In the future, it can enable inspecting SecureBoot keys also.
This warning was added a year and a half ago, but still no test in
NixOS directly instantiates the machine class, presumably because it's
not actually possible for a test to do so without losing
functionality. For example, there's no way for a NixOS test to access
the output directory that create_machine passes to the Machine
constructor.
This warning is therefore just contributing to alert fatigue for
users, who are unable to follow its advice. Once it's actually
possible to do what it suggests, the warning can be reintroduced.
What the code was trying to do was helpfully add a directory and
extension if none were specified, but it did this by checking whether
the filename was composed of a very limited character set that didn't
even include dashes.
With this change, the intention of the code is clearer, and I can put
dashes in my screenshot names.
The output of a command is not guaranteed to be valid UTF-8, so the
decoding can fail raising UnicodeDecodeError. If this happens during a
`succeeds` the check will be erroneously marked failed.
This changes the error handling to the "replace" mode, where invalid
codepoints are replaced with � (REPLACEMENT CHARACTER U+FFFD) and the
decoding can go on.
`shell_interact()` is currently not nice to use. If you try to cancel
the socat process, it will also break the nixos test. Furthermore
ptpython creates it's own terminal that subprocesses are running in,
which breaks some of the terminal features of socat.
Hence this commit extends `shell_interact` to allow also to connect to
arbitrary servers i.e. tcp servers started by socat.
A few places used Unicode U+2018/U+2019 left/right single quotes (but
not always correctly balanced). Let's just use plain ASCII single quotes
everywhere.
For example, the wait_for_unit() call in the Moodle test times out for
myself and others[1], so it would be good to be able to increase it to
something less likely to be hit by a test that would otherwise pass.
[1]: https://github.com/NixOS/nixpkgs/pull/177052#issue-1266336706
Within a dual VM test-setup a strange behaviour was observed.
The two VMs are connected via one vde_switch instance
(instancevirtualisation.vlans = [ 1 ]; IMO a bad attribute name for
switch instances, has nothing to do with VLANs in sense of 802.1Q).
A ping on the base interface (eth1) works, but not on VLAN
subinterfaces (vlan1@eth1). A tcpdump of eth1 includes the ARP requests
tagged with the subinterfaces VLAN ID, but responses seems not to pass
the vde_switch. This works fine if performed on the base interface.
Putting the vde_switch in hub mode results in flooding
traffic to all vde_switch ports. This results in a expected behaviour
and a ping on a VLAN subinterface works as expected.
Signed-off-by: Philippe Schaaf <philippe.schaaf@secunet.com>
Without this fix, setting the shellopts in `machine.execute` is
inconsitent. When no timeout is used, shellopts `set -euo pipefail` are
applied to the command as expected. When a timeout is specified, the
shellopts are not applied to the command itself (which is called inside
a `sh -c` that doesn't inherit the shellopts) but rather to the
`timeout` command, leading to the following full command:
```bash
(set -euo pipefail; timeout 900 sh -c 'cmd') | (base64 --wrap 0; echo)\n
```
With this fix, this is the command we get:
```bash
timeout 900 sh -c 'set -euo pipefail; false | true') | (base64 --wrap 0; echo)\n
```
Naively deduplicate VLANs in the python driver for NixOS tests. The
current implementation accidentally works, since the VLan class mutates
the environment. On construction it sets QEMU_VDE_SOCKET_${id} and this
environment variable gets overwritten once a second VLAN with the same
id is constructed. Because the NIC flags passed to qemu just use the
QEMU_VDE_SOCKET_${id} environment variable, this implicitly chooses a
single vde_switch process for each VLAN.
However, this leads to unusable vde_switch processes being spawned in
each test run and as a side effect makes it impossible to access the
correct VLan objects in the interactive test driver. It also makes it
remarkably hard to understand why the current implementation ever
worked.
The aarch64-linux versions of the boot.uefiUsb and boot.uefiCdrom tests
were broken by b0fc9da879.
That commit was a refactor which omitted the qemuBinary option, which was
previously available in the legacy start command. This restores that
option and fixes the tests previously mentioned.