Some upstream changes broke the automatic fallback to SwiftShader.
Without this workaround the GPU initialization fails (apparently due to requiring Vulkan):
machine # libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
machine # [1001:1001:1101/104304.000625:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
machine # libva error: vaGetDriverNameByIndex() failed with unknown libva error, driver_name = (null)
machine # [1052:1052:1101/104305.060496:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
machine # [1084:1084:1101/104305.165898:ERROR:angle_platform_impl.cc(44)] Display.cpp:894 (initialize): ANGLE Display::initialize error 0: Internal Vulkan error (-3): Initialization of an object could not be completed for implementation-specific reasons, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, initialize:1048.
machine # [1084:1084:1101/104305.171191:ERROR:gl_surface_egl.cc(782)] EGL Driver message (Critical) eglInitialize: Internal Vulkan error (-3): Initialization of an object could not be completed for implementation-specific reasons, in ../../third_party/angle/src/libANGLE/renderer/vulkan/RendererVk.cpp, initialize:1048.
machine # [1084:1084:1101/104305.178707:ERROR:gl_surface_egl.cc(1382)] eglInitialize SwANGLE failed with error EGL_NOT_INITIALIZED
machine # [1084:1084:1101/104305.180111:ERROR:gl_ozone_egl.cc(20)] GLSurfaceEGL::InitializeOneOff failed.
machine # [1084:1084:1101/104305.189760:ERROR:viz_main_impl.cc(161)] Exiting GPU process due to errors during initialization
machine # [1092:1092:1101/104305.233470:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is disabled, ANGLE is
This workaround restores the previous result:
machine # [1004:1004:1101/104551.131190:ERROR:gpu_init.cc(457)] Passthrough is not supported, GL is swiftshader, ANGLE is
The workaround is only required for Chromium, not Google Chrome.
Some specialisations (such as those which affect various boot-time
attributes) cannot be switched to at runtime. This allows picking the
specialisation at boot time.
Massively reduce the time it takes running the test by building a
proper root disk image and increasing the virtualized core count to
4. This should make it much easier for the tests to pass even on
weaker systems.
With my laptop (AMD Ryzen 7 PRO 2700U) as the reference system, I see
the following test run times:
- No change:
Times out after 28 mins
- Building a root image:
7 mins, 48 secs
- Building a root image and bumping the core count:
7 mins, 17 secs
The times include the time it takes to build the image
(~1 min, 20 secs).
Massively reduce the time it takes running the test by building a
proper root disk image and increasing the virtualized core count to
4. This should make it much easier for the tests to pass even on
weaker systems.
With my laptop (AMD Ryzen 7 PRO 2700U, 32GB RAM) as the reference
system, I see the following test run times:
- No change:
25 mins, 49 secs
- Building a root image:
4 mins, 44 secs
- Building a root image and bumping the core count:
3 mins, 6 secs
The times include the time it takes to build the image (~40 secs).
pathsInNixDB isn't a very accurate name when a Nix store image is
built (virtualisation.useNixStoreImage); rename it to additionalPaths,
which should be general enough to cover both cases.
The existing tests for HDFS and YARN only check if the services come up and expose their web interfaces.
The new combined hadoop test will also test whether the services and roles work together as intended.
It spin up an HDFS+YARN cluster and submit a demo YARN application that uses the hadoop cluster for
storageand yarn cluster for compute.
The version 20 of Nextcloud will be EOLed by the end of this month[1].
Since the recommended default (that didn't raise an eval-warning) on
21.05 was Nextcloud 21, this shouldn't affect too many people.
In order to ensure that nobody does a (not working) upgrade across
several major-versions of Nextcloud, I replaced the derivation of
`nextcloud20` with a `throw` that provides instructions how to proceed.
The only case that I consider "risky" is a setup upgraded from 21.05 (or
older) with a `system.stateVersion` <21.11 and with
`services.nextcloud.package` not explicitly declared in its config. To
avoid that, I also left the `else-if` for `stateVersion < 21.03` which
now sets `services.nextcloud.package` to `pkgs.nextcloud20` and thus
leads to an eval-error. This condition can be removed
as soon as 21.05 is EOL because then it's safe to assume that only
21.11. is used as stable release where no Nextcloud <=20 exists that can
lead to such an issue.
It can't be removed earlier because then every `system.stateVersion <
21.11` would lead to `nextcloud21` which is a problem if `nextcloud19`
is still used.
[1] https://docs.nextcloud.com/server/20/admin_manual/release_schedule.html
during the rewrite the checkPasswords=false feature of the old module
was lost. restore it, and with it systems that allow any client to use
any username.
borg is able to process stdin during backups when backing up the special path -,
which can be very useful for backing up things that can be streamed (eg database
dumps, zfs snapshots).
mosquitto needs a lot of attention concerning its config because it doesn't
parse it very well, often ignoring trailing parts of lines, duplicated config
keys, or just looking back way further in the file to associated config keys
with previously defined items than might be expected.
this replaces the mosquitto module completely. we now have a hierarchical config
that flattens out to the mosquitto format (hopefully) without introducing spooky
action at a distance.
This commit changes a lot more that you'd expect but it also adds a lot
of new testing code so nothing breaks in the future. The main change is
that sockets are now restarted when they change. The main reason for
the large amount of changes is the ability of activation scripts to
restart/reload units. This also works for socket-activated units now,
and honors reloadIfChanged and restartIfChanged. The two changes don't
really work without each other so they are done in the one large commit.
The test should show what works now and ensure it will continue to do so
in the future.
allows configuration of foo-over-udp decapsulation endpoints. sadly networkd
seems to lack the features necessary to support local and peer address
configuration, so those are only supported when using scripted configuration.
A few minor changes to get #119638 - nextcloud: add option to set
datadir and extensions - ready:
* `cfg.datadir` now gets `cfg.home` as default to make the type
non-nullable.
* Enhanced the `basic` test to check the behavior with a custom datadir
that's not `/var/lib/nextcloud`.
* Fix hashes for apps in option example.
* Simplify if/else for `appstoreenable` in override config.
* Simplify a few `mapAttrsToList`-expressions in
`nextcloud-setup.service`.
The multipath-tools package had existed in Nixpkgs for some time but
without a nixos module to configure/drive it. This module provides
attributes to drive the majority of multipath configuration options
and is being successfully used in stage-1 and stage-2 boot to mount
/nix from a multipath-serviced iSCSI volume.
Credit goes to @grahamc for early contributions to the module and
authoring the NixOS module test.
Don't worry, it's is true by default. But I think this is important to
have because NixOS indeed shouldn't need Nix at run time when the
installation is not being modified, and now we can verify that.
NixOS images that cannot "self-modify" are a legitamate
use-case that this supports more minimally. One should be able to e.g. do a
sshfs mount and use `nixos-install` to modify them remotely, or just
discard them and build fresh ones if they are run VMs or something.
The next step would be to make generations optional, allowing just
baking `/etc` and friends rather than using activation scripts. But
that's more involved so I'm leaving it out.
The MariaDB version 10.6 doesn't seem supported with current Nextcloud
versions and the test fails with the following error[1]:
nextcloud # [ 14.950034] nextcloud-setup-start[1001]: Error while trying to initialise the database: An exception occurred while executing a query: SQLSTATE[HY000]: General error: 4047 InnoDB refuses to write tables with ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE.
According to a support-thread in upstream's Discourse[2] this is because
of a missing support so far.
Considering that we haven't received any bugreports so far - even though
the issue already exists on master - and the workaround[3] appears to
work fine, an evaluation warning for administrators should be
sufficient.
[1] https://hydra.nixos.org/build/155015223
[2] https://help.nextcloud.com/t/update-to-next-cloud-21-0-2-has-get-an-error/117028/15
[3] setting `innodb_read_only_compressed=0`
Moving the service before multi-user.target (so the `hardened` test
continue to work the way it did before) can result in locking the kernel
too early. It's better to lock it a bit later and changing the test to
wait specifically for the disable-kernel-module-loading.service.
The current name is misleading: it doesn't contain cli arguments,
but several constants and utility functions related to qemu.
This commit also removes the use of `with import ...` for clarity.
Previously the influxdb exporter test was flaky as even after the
service has started there is still a race before the service is actually
listening and accepting connection on port 9122.
With this commit the test will wait for the port to be open before
proceeding.
Hydra accepts timeouts as value of seconds after which the test is
terminated / considered failed. Using the value 30 here has the effect
that the test was terminate after 30 seconds. That time might be
sufficient for the test execution itself but it has another downside:
Jobs on hydra inherit the timeout of their parent. In this case all the
builds that are a dependency of the herbstluftwm test *must* finish
(each) within 30s. And since not all of the dependencies are cached in
the binary cache this could lead to an issue with pacakges that take
longer than 30s to build at the time when the herbstluftwm test is built
by hydra.
It is best to not set the timeout here and let hydra deal with it. Our
default timeout for builds is two hours which is more than sufficient
for most builds and tests. If the test fails we will spent ~2h doing
something or nothing at worst but at least we wont kill the build just
because a dependency wasn't fullfilled already.
This updates systemd to version v249.4 from version v247.6.
Besides the many new features that can be found in the upstream
repository they also introduced a bunch of cleanup which ended up
requiring a few more patches on our side.
a) 0022-core-Handle-lookup-paths-being-symlinks.patch:
The way symlinked units were handled was changed in such that the last
name of a unit file within one of the unit directories
(/run/systemd/system, /etc/systemd/system, ...) is used as the name
for the unit. Unfortunately that code didn't take into account that
the unit directories themselves could already be symlinks and thus
caused all our units to be recognized slightly different.
There is an upstream PR for this new patch:
https://github.com/systemd/systemd/pull/20479
b) The way the APIVFS is setup has been changed in such a way that we
now always have /run. This required a few changes to the
confinement tests which did assert that they didn't exist. Instead of
adding another patch we can just adopt the upstream behavior. An
empty /run doesn't seem harmful.
As part of this work I refactored the confinement test just a little
bit to allow better debugging of test failures. Previously it would
just fail at some point and it wasn't obvious which of the many
commands failed or what the unexpected string was. This should now be
more obvious.
c) Again related to the confinement tests the way a file was tested for
being accessible was optimized. Previously systemd would in some
situations open a file twice during that check. This was reduced to
one operation but required the procfs to be mounted in a units
namespace.
An upstream bug was filed and fixed. We are now carrying the
essential patch to fix that issue until it is backported to a new
release (likely only version 250). The good part about this story is
that upstream systemd now has a test case that looks very similar to
one of our confinement tests. Hopefully that will lead to less
friction in the long run.
https://github.com/systemd/systemd/issues/20514https://github.com/systemd/systemd/pull/20515
d) Previously we could grep for dlopen( somewhat reliably but now
upstream started using a wrapper around dlopen that is most of the
time used with linebreaks. This makes using grep not ergonomic
anymore.
With this bump we are grepping for anything that looks like a
dynamic library name (in contrast to a dlopen(3) call) and replace
those instead. That seems more robust. Time will tell if this holds.
I tried using coccinelle to patch all those call sites using its
tooling but unfornately it does stumble upon the _cleanup_
annotations that are very common in the systemd code.
e) We now have some machinery for libbpf support in our systemd build.
That being said it doesn't actually work as generating some skeletons
doesn't work just yet. It fails with the below error message and is
disabled by default (in both minimal and the regular build).
> FAILED: src/core/bpf/socket_bind/socket-bind.skel.h
> /build/source/tools/build-bpf-skel.py --clang_exec /nix/store/x1bi2mkapk1m0zq2g02nr018qyjkdn7a-clang-wrapper-12.0.1/bin/clang --llvm_strip_exec /nix/store/zm0kqan9qc77x219yihmmisi9g3sg8ns-llvm-12.0.1/bin/llvm-strip --bpftool_exec /nix/store/l6dg8jlbh8qnqa58mshh3d8r6999dk0p-bpftools-5.13.11/bin/bpftool --arch x86_64 ../src/core/bpf/socket_bind/socket-bind.bpf.c src/core/bpf/socket_bind/socket-bind.skel.h
> libbpf: elf: socket_bind_bpf is not a valid eBPF object file
> Error: failed to open BPF object file: BPF object format invalid
> Traceback (most recent call last):
> File "/build/source/tools/build-bpf-skel.py", line 128, in <module>
> bpf_build(args)
> File "/build/source/tools/build-bpf-skel.py", line 92, in bpf_build
> gen_bpf_skeleton(bpftool_exec=args.bpftool_exec,
> File "/build/source/tools/build-bpf-skel.py", line 63, in gen_bpf_skeleton
> skel = subprocess.check_output(bpftool_args, universal_newlines=True)
> File "/nix/store/81lwy2hfqj4c1943b1x8a0qsivjhdhw9-python3-3.9.6/lib/python3.9/subprocess.py", line 424, in check_output
> return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
> File "/nix/store/81lwy2hfqj4c1943b1x8a0qsivjhdhw9-python3-3.9.6/lib/python3.9/subprocess.py", line 528, in run
> raise CalledProcessError(retcode, process.args,
> subprocess.CalledProcessError: Command '['/nix/store/l6dg8jlbh8qnqa58mshh3d8r6999dk0p-bpftools-5.13.11/bin/bpftool', 'g', 's', '../src/core/bpf/socket_bind/socket-bind.bpf.o']' returned non-zero exit status 255.
> [102/1457] Compiling C object src/journal/libjournal-core.a.p/journald-server.c.oapture output)put)ut)
> ninja: build stopped: subcommand failed.
f) We do now have support for TPM2 based disk encryption in our
systemd build. The actual bits and pieces to make use of that are
missing but there are various ongoing efforts in that direction.
There is also the story about systemd in our initrd to enable this
being used for root volumes. None of this will yet work out of the
box but we can start improving on that front.
g) FIDO2 support was added systemd and consequently we can now use
that. Just with TPM2 there hasn't been any integration work with
NixOS and instead this just adds that capability to work on that.
Co-Authored-By: Jörg Thalheim <joerg@thalheim.io>
I currently do not have much time to work on nixpkgs. Remove
myself as a maintainer from a bunch of packages to avoid that
people are waiting on me for a review.
The primary use case is tools like sops-nix and agenix to restart units
when secrets change. There's probably other reasons to restart units as
well and a nice thing to have in general.
These are all packages that I stopped using and hence just create noise
in my inbox for each change affecting them and let's face it, while I
still enjoy contributing to nixpkgs, it doesn't really make sense to be
listed there if I can't do much anyways.
Each of these packages can be taken over by someone or removed if
people think that's reasonable.
Of course, if other maintainers face issues, I can answer some questions
if needed & possible.
This doesn't work anymore and thus breaks the installation leaving a
broken `/var/lib/nextcloud`.
It isn't a big deal since we set this value in the override config
before, so the correct table-prefix is still used. In order to confirm
that, I decided to add a custom prefix to the basic test.