When displaying the amount of time some step took, with no other
context, it becomes nigh impossible (especially in longer tests) to see
when specific steps finished.
The current implementation just forks off a thread to read
QEMU's stdout and lets it exist forever. This, however,
makes the interpreter shutdown racy, as the thread could
still be running and writing out buffered stdout when the
main thread exits (and since it's using the low level API,
the worker thread does not get cleaned up by the atexit hooks
installed by `threading`, either). So, instead of doing that,
let's create a real `threading.Thread` object, and also
explicitly `join` it along with the other stuff when cleaning up.
We have no usecase for manually/selectively starting or stopping VLANs
in integration tests.
By starting and stopping the VLANs with the constructor and destructor
of VLAN objects, we remove the obligation and complexity to maintain
network lifetime separately.
This commit encapsulates the involved domain into classes and
defines explicit and typed arguments where untyped dicts where used.
It preserves backwards compatibility through legacy wrappers.
Previous to this commit, the entire test driver environment was shared
with the actual python test environment.
This is a hefty api surface. This commit selectively exposes only those
symbols to the test environment that are actually meant to be used by
tests.
This is relevant for `nixos-build-vms(8)` which doesn't have a
test-script. In that case it's more intuitive to directly go into the
interactive mode which is IMHO more intuitive.
Previously the driver was configured exclusively through convoluted
environment variables.
Now the driver's defaults are configured through env variables.
Some additional concerns are in the github comments of this PR.
Less nesting, where that improves readability. More nesteing, where
that improves readability, but most importantly:
Expose individual functions separately so that they can be more easily
built directly, eg.:
`nix build --impure --expr '(import ./testing-python.nix {system = builtins.currentSystem;}).mkTestDriver'`
At nixpkgs root:
`rg redirectSerial ./` does not result in any other match
nor does
`rg USE_SERIAL ./` except for an unrelated match in:
pkgs/tools/graphics/argyllcms/default.nix
Bash's standard behavior of not propagating non-zero exit codes
through a pipeline is unexpected and almost universally
unwanted. Default to setting `pipefail` for the command being run;
it can still be turned off by prefixing the pipeline with
`set +o pipefail` if needed.
Also, set `errexit` and `nonunset` options to make the first command
of consecutive commands separated by `;` fail, and disallow
dereferencing unset variables respectively.
For now you had to know that the actions are retried for 900s when
seeing an error like
> Traceback (most recent call last):
> File "/nix/store/dbvmxk60sv87xsxm7kwzzjm7a4fhgy6y-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 927, in run_tests
> exec(tests, globals())
> File "<string>", line 1, in <module>
> File "<string>", line 31, in <module>
> File "/nix/store/dbvmxk60sv87xsxm7kwzzjm7a4fhgy6y-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 565, in wait_for_file
> retry(check_file)
> File "/nix/store/dbvmxk60sv87xsxm7kwzzjm7a4fhgy6y-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 142, in retry
> raise Exception("action timed out")
> Exception: action timed out
in your (hydra) build failure. Due to the absence of timestamps you were
left guessing if the machine was just slow, someone passed a low timeout
value (which they couldn't until now) or whatever might have happened.
By making this error a bit more descriptive (by including the elapsed
time) these hopefully become more useful.