Using __del__ is somewhat unsound resource cleanup in our clase the
logger already closed its logfile and therefor fails with exception
before the rest of the resources can be cleaned up.
We remove the global rootlog in favor of instantiating the logger as
required in the __init__.py and pass it down as a parameter (of our
AbstractLogger type).
Previously, the XML logging was always present and only created an
output file if a special environment variable was present. We now only
create the XML logger if the environment variable is present, saving us
from logging to XML internally if it is not required.
We add a new logger that allows generating a junit-xml compatible report
listing the subtests used in the nixos integration test. Junit-xml is a
widely used standard for test reports. The report can be used for quick
evaluation of which subtest failed.
We use the newly AbstractLogger class and separate the XML and Terminal
logging that is currently mixed into one class. We restore the old
behavior by introducing a CompositeLogger that takes care of logging
both to terminal and XML.
We do not use the generic "nested" function but introduce a separate
subtest log call. This will later allow us to track subtests and account
logs to specific subtests.
As the TODO says, this is already included by the script.
If adding a device, including this again here would result in either
two devices being added, or, if they were explicitly named, an error
due to reuse of the name.
Adds a function to wait for a new QMP event with a model filter
so that you can expect specific type of events with specific payloads.
e.g. a guest-reset-induced shutdown event.
Since the debut of the test-driver, we didn't obtain
a race timer with the test execution to ensure that tests doesn't run beyond
a certain amount of time.
This is particularly important when you are running into hanging tests
which cannot be detected by current facilities (requires more pvpanic wiring up, QMP
API stuff, etc.).
Two easy examples:
- Some QEMU tests may get stuck in some situation and run for more than 24 hours → we default to 1 hour max.
- Some QEMU tests may panic in the wrong place, e.g. UEFI firmware or worse → end users can set a "reasonable" amount of time
And then, we should let the retry logic retest them until they succeed and adjust
their global timeouts.
Of course, this does not help with the fact that the timeout may need to be
a function of the actual busyness of the machine running the tests.
This is only one step towards increased reliability.
Now that we have a QMP client, we can wire it up in the test driver.
For now, it is almost completely useless because of the need of a constant "event loop", especially
for event listening.
In the next commits, we will slowly enable more and more usecases.
When listening on unix sockets, it doesn't make sense to specify a port
for nginx's listen directive.
Since nginx defaults to port 80 when the port isn't specified (but the
address is), we can change the default for the option to null as well
without changing any behaviour.
Since 008f9f0cd4
("nixos/test-driver: actually use the backdoor message to wait for backdoor"),
when boot is still computering, we can get a tons of empty strings in response to the shell.
This is not really useful to print and waste the disk space for any CI system that logs them.
We stop logging chunks whenever they are empty.
While working on #192270, I noticed that only some wait_for_* helper
functions make the timeout configurable. I think we should be able to
customize it in all cases
New EDK2 sets up the backdoor port as a serial console, which feeds the test driver
a bunch of boot logs it can safely ignore. Do so by waiting for the message the
backdoor shell prints before doing anything else.
By some miracle, before, it was possible to reconnect to the `node1` without
doing any relevant dance.
But now we are direct booting (¿), it seems like we need to do the right things.
This introduces a `check_output` flag for `execute` because we do not want to steal the
messages from the backdoor service as we might execute the kexec too fast compared
to when we will reconnect.
Therefore, we will let the message in the pipe if needed.
- `wait_until_fails` was not passing through its `timeout` argument to
the internal `retry` function, hence was always using 900 seconds (the
default timeout for `retry`) rather than the user-specified value.