Without this fix, setting the shellopts in `machine.execute` is
inconsitent. When no timeout is used, shellopts `set -euo pipefail` are
applied to the command as expected. When a timeout is specified, the
shellopts are not applied to the command itself (which is called inside
a `sh -c` that doesn't inherit the shellopts) but rather to the
`timeout` command, leading to the following full command:
```bash
(set -euo pipefail; timeout 900 sh -c 'cmd') | (base64 --wrap 0; echo)\n
```
With this fix, this is the command we get:
```bash
timeout 900 sh -c 'set -euo pipefail; false | true') | (base64 --wrap 0; echo)\n
```
Naively deduplicate VLANs in the python driver for NixOS tests. The
current implementation accidentally works, since the VLan class mutates
the environment. On construction it sets QEMU_VDE_SOCKET_${id} and this
environment variable gets overwritten once a second VLAN with the same
id is constructed. Because the NIC flags passed to qemu just use the
QEMU_VDE_SOCKET_${id} environment variable, this implicitly chooses a
single vde_switch process for each VLAN.
However, this leads to unusable vde_switch processes being spawned in
each test run and as a side effect makes it impossible to access the
correct VLan objects in the interactive test driver. It also makes it
remarkably hard to understand why the current implementation ever
worked.
The aarch64-linux versions of the boot.uefiUsb and boot.uefiCdrom tests
were broken by b0fc9da879.
That commit was a refactor which omitted the qemuBinary option, which was
previously available in the legacy start command. This restores that
option and fixes the tests previously mentioned.
When displaying the amount of time some step took, with no other
context, it becomes nigh impossible (especially in longer tests) to see
when specific steps finished.
The current implementation just forks off a thread to read
QEMU's stdout and lets it exist forever. This, however,
makes the interpreter shutdown racy, as the thread could
still be running and writing out buffered stdout when the
main thread exits (and since it's using the low level API,
the worker thread does not get cleaned up by the atexit hooks
installed by `threading`, either). So, instead of doing that,
let's create a real `threading.Thread` object, and also
explicitly `join` it along with the other stuff when cleaning up.
We have no usecase for manually/selectively starting or stopping VLANs
in integration tests.
By starting and stopping the VLANs with the constructor and destructor
of VLAN objects, we remove the obligation and complexity to maintain
network lifetime separately.
This commit encapsulates the involved domain into classes and
defines explicit and typed arguments where untyped dicts where used.
It preserves backwards compatibility through legacy wrappers.
Previous to this commit, the entire test driver environment was shared
with the actual python test environment.
This is a hefty api surface. This commit selectively exposes only those
symbols to the test environment that are actually meant to be used by
tests.
This is relevant for `nixos-build-vms(8)` which doesn't have a
test-script. In that case it's more intuitive to directly go into the
interactive mode which is IMHO more intuitive.
Previously the driver was configured exclusively through convoluted
environment variables.
Now the driver's defaults are configured through env variables.
Some additional concerns are in the github comments of this PR.
Less nesting, where that improves readability. More nesteing, where
that improves readability, but most importantly:
Expose individual functions separately so that they can be more easily
built directly, eg.:
`nix build --impure --expr '(import ./testing-python.nix {system = builtins.currentSystem;}).mkTestDriver'`
At nixpkgs root:
`rg redirectSerial ./` does not result in any other match
nor does
`rg USE_SERIAL ./` except for an unrelated match in:
pkgs/tools/graphics/argyllcms/default.nix
Bash's standard behavior of not propagating non-zero exit codes
through a pipeline is unexpected and almost universally
unwanted. Default to setting `pipefail` for the command being run;
it can still be turned off by prefixing the pipeline with
`set +o pipefail` if needed.
Also, set `errexit` and `nonunset` options to make the first command
of consecutive commands separated by `;` fail, and disallow
dereferencing unset variables respectively.
For now you had to know that the actions are retried for 900s when
seeing an error like
> Traceback (most recent call last):
> File "/nix/store/dbvmxk60sv87xsxm7kwzzjm7a4fhgy6y-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 927, in run_tests
> exec(tests, globals())
> File "<string>", line 1, in <module>
> File "<string>", line 31, in <module>
> File "/nix/store/dbvmxk60sv87xsxm7kwzzjm7a4fhgy6y-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 565, in wait_for_file
> retry(check_file)
> File "/nix/store/dbvmxk60sv87xsxm7kwzzjm7a4fhgy6y-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 142, in retry
> raise Exception("action timed out")
> Exception: action timed out
in your (hydra) build failure. Due to the absence of timestamps you were
left guessing if the machine was just slow, someone passed a low timeout
value (which they couldn't until now) or whatever might have happened.
By making this error a bit more descriptive (by including the elapsed
time) these hopefully become more useful.
On my system I have XWayland disabled and therefore only WAYLAND_DISPLAY
is set. This ensures that the graphical output will still be enabled on
such setups (both Wayland and X11 are supported by the viewer).
When performing OCR, some of the Tesseract settings perform better than
others on a variety of different workloads, but they mostly take
~negligible incremental time to run compared to the overhead of running
the ImageMagick filters.
After this commit, we try using all three of the current Tesseract
models (classic, LSTM, and classic+LSTM) to generate output text. This
fixes chromium-90's tests at release-20.09, and should make cases where
you're looking for *specific* text better, with the tradeoff of running
Tesseract multiple times.
To make it sensible to cherrypick this into release-20.09, this doesn't
change the existing API surface for the test driver. In particular,
get_screen_text continues to have the existing behaviour.
According to Python documentation [0], `bufsize=1` is only meaningful in
text mode. As we don't pass in an argument called `universal_newlines`,
`encoding`, `errors` or `text` the file objects aren't opened in text
mode, which means the argument is ignored with a warning in Python 3.8.
line buffering (buffering=1) isn't supported in binary mode,
the default buffer size will be used
This commit removes this warning that appared when using
interactive test driver built with `-A driver`. This is done by
removing `bufsize=1` from Popen calls.
The default parameter when unspecified for `bufsize` is `-1` which
according to the documentation will be interpreted as
`io.DEFAULT_BUFFER_SIZE`. As mentioned by a warning, Python already
uses default buffer size when providing `buffering=1` parameter for
file objects not opened in text mode.
[0]: https://docs.python.org/3/library/subprocess.html#subprocess.Popen
ecb73fd555 introduced a new keepVmState
CLI flag for test-driver.py. This CLI flags gets forwarded to the
Machine class through create_machine.
It created a regression for the boot tests where __main__ end up not
being evaluated. See
https://github.com/NixOS/nixpkgs/pull/97346#issuecomment-690951837 for
bug report.
Defaulting keepVmState to false when __main__ ends up not being
evaluated.
The previous version of the code would only kick in if the state
directory path pointed at a *file*, which never occurs. Making that
codepath actually work reveals an ordering bug, which this patch fixes
as well.
It also replaces the confusing, imperative case log message "delete VM
state directory" with "deleting VM state directory".
Finally, we hint the user about how to prevent this deletion. IE. by
passing the --keep-vm-state flag.
Bug report:
https://github.com/NixOS/nixpkgs/pull/91046#issuecomment-685568750
Credit goes to Edef for the rebase on top of a recent nixpkgs commit
and for writing most of this commit message.
Co-authored-by: edef <edef@edef.eu>
This reverts commit 1bff6fe17c, reversing
changes made to 2995fa48cb.
There’s presumably nothing wrong with this PR, except that it
conflicts with reverting #96254 which broke several tests (#96699).
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
With the Perl driver, machine.sleep(N) was doing a sleep on the guest
machine instead of the host machine. The new Python test driver however
uses time.sleep(), which instead sleeps on the host.
While this shouldn't make a difference most of the time, it *does*
however make a huge difference if the test machine is loaded and you're
sleeping for a minimum duration of eg. an animation.
I stumbled on this while porting most of all my tests to the new Python
test driver and particularily my video game tests failed on a fairly
loaded machine, whereas they don't with the Perl test driver.
Switching the sleep() method to sleep on the guest instead of the host
fixes this.
Signed-off-by: aszlig <aszlig@nix.build>
Keeping the VM state test across several run sometimes lead to subtle
and hard to spot errors in practice. We delete the VM state which
contains (among other things) the qcow volume.
We also introduce a -K (--keep-vm-state) flag making VM state to
persist after the test run. This flag makes test-driver.py to match
its previous behaviour.
xchg is advertised as a bidirectional exchange dir, but file content
transfer from host to VM fails due to caching:
If a file is read in the VM and then modified on the host, subsequent
re-reads in the VM can yield old, cached data.
This is caused by the use of 9p's cache=loose mode that is explicitly
meant for read-only mounts.
9p doesn't provide any suitable cache modes, so fix this by disabling
caching.
Also, remove a now unnecessary sync in the test driver.
These syncs have the goal to transfer host filesystem changes to the VM,
but they have no effect because 1) syncing in the VM can't possibly pull
in host data and 2) 9p is accessing the host filesystem on the cached
layer anyways, so even syncing on the host would have no effect in the
VM.
The test harness provides the commands it wishes to run in Bourne
syntax. This fails if the user uses a different shell. For example,
with fish:
machine.wait_for_unit("graphical-session.target", "alice")
machine # fish: Unsupported use of '='. To run '-u`' with a modified environment, please use 'env XDG_RUNTIME_DIR=/run/user/`id -u`…'
machine # XDG_RUNTIME_DIR=/run/user/`id -u` systemctl --user --no-pager show "graphical-session.target"
machine # ^
machine # [ 16.329957] su[1077]: pam_unix(su:session): session closed for user alice
error: retrieving systemctl info for unit "graphical-session.target" under user "alice" failed with exit code 127
This completes the removal of the nested log feature, which previously
got removed from Nix, Hydra, stdenv and GNU Make. In particular, this
means that the output of VM builds no longer contains a copy of
jQuery.
If a program (e.g. nixos-install) writes more than 1000 lines to
stderr during execute(), then process_serial_output() deadlocks
waiting for the queue to be processed. So use an unbounded queue
instead.
We should probably get rid of the structured log output (log.xml),
since then we don't need the log queue anymore.
The docstring says it uses a directory shared among all vms, although
that doesn't seem necessary for the functionality. However, it does need
to be consistent between the guest and host.
The codec format 'unicode_escape' was introduced in 52ee102 to handle
undecodable bytes in boot menus.
This made the problem worse as unicode chars outside of iso-8859-1
produce garbled output and valid utf-8 strings (such as "\x" ) trigger
decoding errors.
Fix this by using the default 'utf-8' codec and by explicitly ignoring
decoding errors.
This changes the python test driver to match the behavior of the perl
test driver. I.e. the directory mounted into /tmp/shared should be the
same for all machines.
This probably fixes many tests, but I found this while investigating
failures in nixos/tests/ceph-multi-node.nix.
we previously immediately returned the first commands output, and didn't
execute any of the other commands.
Now, return the last commands output.
This should be documented in the method docstring.
Condition seems to be inverted. Crash and shutdown only make sense, when
the machine is booted; i.e. we return immediately otherwise.
In the Perl test driver this is:
return unless $self->{booted};
When IPXE tests were added, an option was added for configuring only
the frontend, and the backend configuration was dropped entirely. This
caused most installer tests to fail.
See #49441 for an earlier attempt, which was subsequently reverted. I am
assuming that doubling the time will be sufficient if the machine is
overloaded since so many of the tests already pass at 5 minutes, while
still not holding back failures for needlessly long.
The ability to specify "-drive if=scsi" has been removed in QEMU version
2.12 (introduced in 3e3b39f173).
Quote from https://wiki.qemu.org/ChangeLog/2.12#Incompatible_changes:
> The deprecated way of configuring SCSI devices with "-drive if=scsi"
> on x86 has been removed. Use an appropriate SCSI controller together
> "-device scsi-hd" or "-device scsi-cd" and a corresponding "-blockdev"
> parameter instead.
So whenever the diskInterface is "scsi" we use the new way to specify
the drive and fall back to the deprecated way for the time being. The
reason why I'm not using the new way for "virtio" and "ide" as well is
because there is no simple generic way anymore to specify these.
This also turns the type of the virtualisation.qemu.diskInterface option
to be an enum, so the user knows which values are allowed but we can
also make sure the right value is provided to prevent typos.
I've tested this against a few non-disk-related NixOS VM tests but also
the installer.grub1 test (because it uses "ide" as its drive interface),
the installer.simple test (just to be sure it still works with
"virtio") and all the tests in nixos/tests/boot.nix.
In order to be able to run the grub1 test I had to go back to
8b1cf100cd (which is a known commit where
that test still works) and apply the QEMU update and this very commit,
because right now the test is broken.
Apart from the tests here in nixpkgs, I also ran another[1] test in
another repository which uses the "scsi" disk interface as well (in
comparison to most of the installer tests, this one actually failed
prior to this commit).
All of them now succeed.
[1]: 9b5a119972/tests/system/kernel/bfq.nix
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @edostra, @grahamc, @dezgeg, @abbradar, @ts468
It is quite complicated to test services using the test-driver when
declaring user services with `systemd.user.services` such as many
X11-based services like `xautolock.service`.
This change adds an optional `$user` parameter to each systemd-related
function in the test-driver and runs `systemctl --user` commands using
`su -l $user -c ...` and sets the `XDG_RUNTIME_DIR` variable
accordingly and a new function named `systemctl` which is able to run a
systemd command with or without a specified user.
The change can be confirmed with a simple VM declaration like this:
```
import ./nixos/tests/make-test.nix ({ pkgs, lib }:
with lib;
{
name = "systemd-user-test";
nodes.machine = {
imports = [ ./nixos/tests/common/user-account.nix ];
services.xserver.enable = true;
services.xserver.displayManager.auto.enable = true;
services.xserver.displayManager.auto.user = "bob";
services.xserver.xautolock.enable = true;
};
testScript = ''
$machine->start;
$machine->waitForX;
$machine->waitForUnit("xautolock.service", "bob");
$machine->stopJob("xautolock.service", "bob");
$machine->startJob("xautolock.service", "bob");
$machine->systemctl("list-jobs --no-pager", "bob");
$machine->systemctl("show 'xautolock.service' --no-pager", "bob");
'';
})
```
machine: must succeed: xwininfo -root -tree | sed 's/.*0x[0-9a-f]* \"\([^\"]*\)\".*/\1/; t; d'
machine: exit status 0
machine: Last chance to match /(?^:dfiirst configuration)/ on the the window list, which currently contains:
machine: [i3 con] container around 0xf8a5f0, i3: first configuration, [i3 con] floatingcon around 0xf8c260, [i3 con] container around 0xf8a380, i3bar for output Virtual-1, [i3 con] bottom dockarea Virtual-1, [i3 con] workspace 1, [i3 con] content Virtual-1, [i3 con] top dockarea Virtual-1, [i3 con] output Virtual-1, [i3 con] workspace __i3_scratch, [i3 con] content __i3, [i3 con] pseudo-output __i3, i3
machine: Last chance to match /(?^:BALICE)/ on the screen, which currently contains:
machine: performing optical character recognition
machine: sending monitor command: screendump /tmp/nix-build-vm-test-run-sddm.drv-0/ocrin.ppm
machine: Session Layout
O O
0 1 : 0 9
Wednesday, June 21, 2017
|_ I
Select your user and enter password
If the test has not passed yet, on the last attempt it now outputs:
machine: Last chance to match /logine: / on TTY2, which currently contains:
machine: running command: fold -w$(stty -F /dev/tty2 size | awk '{print $2}') /dev/vcs2
machine: exit status 0
machine:
<<< Welcome to NixOS 17.09.git.a804ef4 (x86_64) - tty2 >>>
machine login:
to help debug the problem. Notice the "logine" typo in my check.
First of all, we're now using ImageMagick to improve the screenshot so
that Tesseract has an esier time to recognize the text. The resulting
image of this post-processing is a scaled up black-and-white version
with the backgrounds almost entirely removed and the text edges a bit
blurred, so the screen shots now more or less resemble an image from a
scanner rather. This is what Tesseract is trained for by default.
As mentioned in the previous commit we now also use Tesseract 4, which
further improves the quality of text recognition.
I've spent countless hours just to test different postprocessing
variants and testing what works best for our tests and this is the one
that worked best so far. It's certainly not perfect and I'd like to
avoid the scaling step but we're way better off than before.
In addition to this, the OCR process is now done without an intermediate
file, solely using pipes.
I've tested this using the following VM tests which have OCR enabled:
* nixos/tests/chromium.nix -A stable
* nixos/tests/emacs-daemon.nix
* nixos/tests/installer.nix -A luksroot
* nixos/tests/lightdm.nix
* nixos/tests/plasma5.nix
* nixos/tests/sddm.nix
All of the tests still succeed and comparing some of the recognition
results to the earlier results it now also detects a lot more text than
before this commit.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
A long-time issue and one of the reasons I've never used that function
before. So let's remove that todo-comment and escape the contents
properly.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @edolstra
Regression introduced by d84741a4bf.
The mentioned commit actually is a good thing, because we now get the
output from the X session.
Unfortunately, for the i3wm test, the i3-config-wizard prints out the
raw keyboard symbols directly coming from xcb, so the output isn't
necessarily proper UTF-8.
As the XML::Writer already expects valid UTF-8 input, we assume that
everything that comes into sanitise() will be UTF-8 from the start. So
we just decode() it using FB_DEFAULT as the check argument so that
every invalid character is replaced by the unicode replacement
character:
https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character
We simply re-oncode it again afterwards and return it, so we should
always get out valid UTF-8 in the log XML.
For more information about FB_DEFAULT and FB_CROAK, have a look at:
http://search.cpan.org/~dankogai/Encode-2.84/Encode.pm#Handling_Malformed_Data
Signed-off-by: aszlig <aszlig@redmoonstudios.org>