Currently libvirt requires two qemu derivations: qemu and qemu_kvm which is just a truncated version of qemu (defined as qemu.override { hostCpuOnly = true; }).
This patch exposes an option virtualisation.libvirtd.qemuPackage which allows to choose which package to use:
* pkgs.qemu_kvm if all your guests have the same CPU as host, or
* pkgs.qemu which allows to emulate alien architectures (for example ARMV7L on X86_64), or
* a custom derivation
virtualisation.libvirtd.enableKVM option is vague and could be deprecate in favor of virtualisation.libvirtd.qemuPackage, anyway it does allow to enable/disable kvm.
Without this, when you've enabled networkmanager and start a
nixos-container the container will briefly have its specified IP
address but then networkmanager starts managing it causing the IP
address to be dropped.
This is required on the ThunderX CPUs on the Packet.net Type-2A
machines that have a GICv3. For some reason the default is to create a
GICv2 independent of the host hardware...
This is required by the new c5.* instance types.
Note that this changes disk names from /dev/xvd* to
/dev/nvme0n*. Amazon Linux has a udev rule that calls a Python script
named "ec2nvme-nsid" to create compatibility symlinks. We could use
that, but it would mean adding Python to the AMI closure...
Unlike pathsFromGraph, on Nix 1.12, this function produces a
registration file containing correct NAR hash/size information.
https://hydra.nixos.org/build/62832723
-s, --script: never prompts for user intervention
Sometimes the NixOS installer tests fail when they invoke parted, e.g.
https://hydra.nixos.org/build/62513826/nixlog/1. But instead of exiting
right there, the tests hang until the Nix builder times out (and kills
the build). With this change the tests would instead fail immediately,
which is preferred.
While at it, use "parted --script" treewide, so nobody gets build
timeout due to parted error (or misuse). (Only nixos/ use it, and only
non-interactive.)
A few instances already use the short option "-s", convert them to long
option "--short".
Container config example code mentions `postgresql` service, but the correct use of that service involves setting `system.stateVersion` option (as discovered in https://github.com/NixOS/nixpkgs/issues/30056).
The actual system state version is set randomly to 17.03 because I have no preferences here
There are currently two ways to build Openstack image. This just picks
best of both, to keep only one!
- Image is resizable
- Cloudinit is enable
- Password authentication is disable by default
- Use the same layer than other image builders (ec2, gce...)
Although it is quite safe to restart ```libvirtd``` when there are only ```qemu``` machines, in case if there are ```libvirt_lxc``` containers, a restart may result in putting the whole system into an odd state: the containers go on running but the new ```libvirtd``` daemons do not see them.
This allows to run the prune job periodically on a machine.
By default the if enabled the job is run once a week.
The structure is similar to how system.autoUpgrade works.
Use xmlstarlet to update the OVMF path on each startup, like we do for
<emulator>...qemu-kvm</emulator>.
A libvirt domain using UEFI cannot start if the OVMF path is garbage
collected/missing.
Instead of grep and sed, which is brittle.
(I don't know how to preserve the comment we currently add to say that
this line is auto-updated. But I don't think it adds much value, so I'm
not spending any effort on it.)
This commit adds the xen_4_8 package to be used instead of
xen (currently at 4.5.5):
* Add packages xen_4_8, xen_4_8-slim and xen_4_8-light
* Add packages qemu_xen_4_8 and qemu_xen_4_8-light to be used
with xen_4_8-slim and xen_4_8-light respectively.
* Add systemd to buildInputs of xen (it is required by oxenstored)
* Adapt xen service to work with the new version of xen
* Use xen-init-dom0 to initlilise dom0 in xen-store
* Currently, the virtualisation.xen.stored option is ignored
if xen 4.8 is used
OVMF{,CODE,VARS}.fd are now available in a dedicated fd output, greatly
reducing the closure in the common case where only those files are used (a
few MBs versus several hundred MBs for the full OVMF).
Note: it's unclear why `dontPatchELF` is now necessary for the build to
pass (on my end, at any rate) but it doesn't make much sense to run this
fixup anyway,
Note: my reading of xen's INSTALL suggests that --with-system-ovmf should
point directly to the OVMF binary. As such, the previous invocation was
incorrect (it pointed to the root of the OVMF tree). In any case, I have
only built xen with `--with-system-ovmf`, I have not tested it.
Fixes https://github.com/NixOS/nixpkgs/issues/25854
Closes https://github.com/NixOS/nixpkgs/pull/25855
Provide the option forwardDns in virtualisation.xen.bridge, which
enables forwarding of DNS queries to the default resolver, allowing
outside internet access for the xen guests.
The xen-bridge service accepts the option prefixLength, but does not
use it to set the actual netmask on the bridge. This commit makes
it set the correct netmask.
QEMU can allow guests to access more than one host core at a time.
Previously, this had to be done via ad-hoc arguments:
virtualisation.qemu.options = ["-smp 12"];
Now you can simply specify:
virtualisation.cores = 12;
Unfortunately, somewhere between 16.09 and 17.03, paravirtualized
instances stopped working. They hang at the pv-grub prompt
("grubdom>"). I tried reverting to a 4.4 kernel, reverting kernel
compression from xz to bzip2 (even though pv-grub is supposed to
support xz), and reverting the only change to initrd generation
(5a8147479e). Nothing worked so I'm
giving up.
Docker socket is world writable. This means any user on the system is
able to invoke docker command. (Which is equal to having a root access
to the machine.)
This commit makes socket group-writable and owned by docker group.
Inspired by
https://github.com/docker/docker/blob/master/contrib/init/systemd/docker.socket
Having fixed the Google Compute Engine image build process's copying
of store paths in PR #24264, I ran `nixos-rebuild --upgrade switch`...
and the GCE image broke again, because it sets the NixOS configuration
option for the sysctl variable `kernel.yama.ptrace_scope` to
`mkDefault "1"`, i.e., with override priority 1000, and now the
`sysctl` module sets the same option to `mkDefault "0"` (this was
changed in commit 86721a5f78).
This patch raises the override priority of the Google Compute Engine
image configuration's definition of the Yama sysctl option to 500
(still lower than the priority of an unmodified option definition).
I have tested that this patch allows the Google Compute Engine image
to again build successfully for me.
In `nixos/modules/virtualisation/google-compute-image.nix`, copy store
paths with `rsync -a` rather than `cp -prd`, because `rsync` seems
better able to handle the hard-links that may be present in the store,
whereas `cp` may fail to copy them.
I have tested that the Google Compute Engine image builds successfully
for me with this patch, whereas it did not without this patch.
This is the same fix applied for Azure images in commit
097ef6e435.
Fixes#23973.
We now make it happen later in the boot process so that multi-user
has already activated, so as to not run afoul of the logic in
switch-to-configuration.pl. It's not my favorite solution, but at
least it works. Also added a check to the VM test to catch the failure
so we don't break in future.
Fixes#23121
The initialization code is now a systemd service that explicitly
waits for network-online, so the occasional failure I was seeing
because the `nixos-rebuild` couldn't get anything from the binary
cache should stop. I hope!
fix#22709
Recent pvgrub (from Grub built with “--with-platform=xen”) understands
the Grub2 configuration format. Grub legacy configuration (menu.lst) is
ignored.
A very simple skeleton for now that doesn't attempt to model any of
the agent configuration, but we can grow it later. Tested and works
on an EC2 instance with ECS.
All the new options in detail:
Enable docker in multi-user.target make container created with restart=always
to start. We still want socket activation as it decouples dependencies between
the existing of /var/run/docker.sock and the docker daemon. This means that
services can rely on the availability of this socket. Fixes#11478#21303
wantedBy = ["multi-user.target"];
This allows us to remove the postStart hack, as docker reports on its own when
it is ready.
Type=notify
The following will set unset some limits because overhead in kernel's ressource
accounting was observed. Note that these limit only apply to containerd.
Containers will have their own limit set.
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
Upgrades may require schema migrations. This can delay the startup of dockerd.
TimeoutStartSec=0
Allows docker to create its own cgroup subhierarchy to apply ressource limits on
containers.
Delegate=true
When dockerd is killed, container should be not affected to allow
`live restore` to work.
KillMode=process
Overlayfs is quite a bit faster, e.g. with it the KDE 5 test takes ~7m
instead of ~30m on my laptop (which is still not great, since plain
9pfs is ~4m30s).
This works around:
machine: must succeed: nix-store -qR /run/current-system | grep nixos-
machine# error: changing ownership of path ‘/nix/store’: Invalid argument
Probably Nix shouldn't be anal about the ownership of the store unless
it's trying to build/write to the store.
http://hydra.nixos.org/build/45093872/nixlog/17/raw
(cherry picked from commit 57a0f14064)
Previously we were using two or three (qemu_kvm, qemu_test, and
qemu_test with a different dbus when minimal.nix is included).
(cherry picked from commit 8bfa4ce82e)
- most nixos user only require time synchronisation,
while ntpd implements a battery-included ntp server (1,215 LOCs of C-Code vs 64,302)
- timesyncd support ntp server per interface (if configured through dhcp for instance)
- timesyncd is already included in the systemd package, switching to it would
save a little disk space (1,5M)
A secret can be stored in a file. It is written at runtime in the
configuration file.
Note it is also possible to write them in the nix store for dev
purposes.
This commit introduces a nixos module for the Openstack Keystone
service. It also provides a optional bootstrap step that creates some
basic initial resources (tenants, endpoints,...).
The provided test starts Keystone by enabling bootstrapping and checks
if user creation works well.
This commit is based on initial works made by domenkozar.
Allows one or more directories to be mounted as a read-only file system.
This makes it convenient to run volatile containers that do not retain
application state.
Fix automatic mouse grabbing/releasing when running as a vmware guest.
1. The xf86inputvmmouse is not loaded by default. Add it.
2. InptutDevice sections for which specify a driver are ignored if
AutoAddDevices is enabled (which it is by default). See [1]. Instead use
an InputClass to load the vmmouse driver.
[1] https://www.x.org/archive/X11R7.7/doc/man/man5/xorg.conf.5.xhtml#heading8
The reason to patch QEMU is that with latest Nix, tests like "printing"
or "misc" fail because they expect the store paths to be owned by uid 0
and gid 0.
Starting with NixOS/nix@5e51ffb1c2, Nix
builds inside of a new user namespace. Unfortunately this also means
that bind-mounted store paths that are part of the derivation's inputs
are no longer owned by uid 0 and gid 0 but by uid 65534 and gid 65534.
This in turn causes things like sudo or cups to fail with errors about
insecure file permissions.
So in order to avoid that, let's make sure the VM always gets files
owned by uid 0 and gid 0 and does a no-op when doing a chmod on a store
path.
In addition, this adds a virtualisation.qemu.program option so that we
can make sure that we only use the patched version if we're *really*
running NixOS VM tests (that is, whenever we have imported
test-instrumentation.nix).
Tested against the "misc" and "printing" tests.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The dnsmasq instance run by the xen-bridge.service errorenously
hands out 172.16.0.0 as the netmask over DHCP to the VMs. This
commit removes the option responsible for that from dnsmasq.conf,
so that the proper netmask is inferred by dnsmasq instead.
Addresses https://github.com/NixOS/nixpkgs/issues/19883
The calls to iptables in xen-bridge.service were missing the -w switch,
which caused them to fail if another script was calling iptables
at the same time. Fix it by adding the -w switch.
Addresses https://github.com/NixOS/nixpkgs/issues/19849 .
This adds the containers.<name>.enableTun option allowing containers to
access /dev/net/tun. This is required by openvpn, tinc, etc. in order to
work properly inside containers.
The new option builds on top of two generic options
containers.<name>.additionalCapabilities and
containers.<name>.allowedDevices which also can be used for example when
adding support for FUSE later down the road.
Get rid of the "or null" stuff. Also change 'cfg . "foo"' to 'cfg.foo'.
Also fixed what appears to be an actual bug: in postStartScript,
cfg.attribute (where attribute is a function argument) should be
cfg.${attribute}.
This introduces VirtualBox version 5.1.6 along with a few refactored
stuff, notably:
* Kernel modules and user space applications are now separate
derivations.
* If config.pulseaudio doesn't exist in nixpkgs config, the default is
now to build with PulseAudio modules.
* A new updater to keep VirtualBox up to date.
All subtests in nixos/tests/virtualbox.nix succeed on my machine and
VirtualBox was reported to be working by @DamienCassou (although with
unrelated audio problems for another fix/branch) and @calbrecht.
- logDriver option, use journald for logging by default
- keep storage driver intact by default, as docker has sane defaults
- do not choose storage driver in tests, docker will choose by itself
- use dockerd binary as "docker daemon" command is deprecated and will be
removed
- add overlay2 to list of storage drivers
VirtualBox user space binaries now no longer reside in linuxPackages, so
let's use the package for the real user space binaries instead.
Tested using the following command:
nix-build nixos/release.nix -A ova.x86_64-linux
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Putting the kernel modules into the same output path as the main
VirtualBox derivation causes all of VirtualBox to be rebuilt on every
single kernel update.
The build process of VirtualBox already outputs the kernel module source
along with the generated files for the configuration of the main
VirtualBox package. We put this into a different output called "modsrc"
which we re-use from linuxPackages.virtualbox, which is now only
containing the resulting kernel modules without the main user space
implementation.
This not only has the advantage of decluttering the Nix expression for
the user space portions but also gets rid of the need to nuke references
and the need to patch out "depmod -a".
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Systemd upstream provides targets for networking. This also includes a target network-online.target.
In this PR I remove / replace most occurrences since some of them were even wrong and could delay startup.
Fixes#13927
cc @edolstra
configFile in make-disk-image clashes with clone-config as the latter does
nothing if it finds a a /etc/nixos/configuration.nix during stage-2.
With these changes, a container can have more then one veth-pair. This allows for example to have LAN and DMZ as bridges on the host and add dedicated containers for proxies, ipv4-firewall and ipv6-firewall. Or to have a bridge for normal WAN, one bridge for administration and one bridge for customer-internal communication. So that web-server containers can be reached from outside per http, from the management via ssh and can talk to their database via the customer network.
The scripts to set up the containers are now rendered several times instead of just one template. The scripts now contain per-container code to configure the extra veth interfaces. The default template without support for extra-veths is still rendered for the imperative containers.
Also a test is there to see if extra veths can be placed into host-bridges or can be reached via routing.
This makes the container a bit more secure, by preventing root
creating device nodes to access the host file system, for
instance. (Reference: systemd-nspawn@.service in systemd.)
This moves nixos-containers into its own package so that it can be
relied upon by other packages/systems. This should make development
using dynamic containers much easier.
We need to use wrapped modprobe, so that it finds the right
modules. Docker needs modprobe to load overlay kernel module
for example.
This fixes an an error starting docker if the booted system's kernel
version is different from the /run/current-system profile's one.
Since systemd version 230, it is required to have a machine-id file
prior to the startup of the container. If the file is empty, a transient
machine ID is generated by systemd-nspawn.
See systemd/systemd#3014 for more details on the matter.
This unbreaks all of the containers-* NixOS tests.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @edolstra
Closes: #15808
The existence of $root/var/lib/private/host-notify as a socket
prevented a bind mount:
container foo[8083]: Failed to create mount point /var/lib/containers/foo/var/lib/private/host-notify: No such device or address
This allows setting options for the same LUKS device in different
modules. For example, the auto-generated hardware-configuration.nix
can contain
boot.initrd.luks.devices.crypted.device = "/dev/disk/...";
while configuration.nix can add
boot.initrd.luks.devices.crypted.allowDiscards = true;
Also updated the examples/docs to use /disk/disk/by-uuid instead of
/dev/sda, since we shouldn't promote the use of the latter.
Without the templating (which is still present for imperative containers), it
will be possible to set individual dependencies. Like depending on the network
only if the hostbridge or hardware interfaces are used.
Ported from #3021
This allows the containers to have their interface in a bridge on the host.
Also this adds IPv6 addresses to the containers both with bridged and unbridged
network.
NixOps has infrequent releases, so it's not the best place for keeping
the list of current AMIs. Putting them in Nixpkgs means that AMI
updates will be delivered as part of the NixOS channels.
We now generate a qcow2 image to prevent hitting Hydra's output size
limit. Also updated /root/user-data -> /etc/ec2-metadata/user-data.
http://hydra.nixos.org/build/33843133
Previously this was done in three derivations (one to build the raw
disk image, one to convert to OVA, one to add a hydra-build-products
file). Now it's done in one step to reduce the amount of copying
to/from S3. In particular, not uploading the raw disk image prevents
us from hitting hydra-queue-runner's size limit of 2 GiB.
Allow usage of list of strings instead of a comma-separated string
for filesystem options. Deprecate the comma-separated string style
with a warning message; convert this to a hard error after 16.09.
15.09 was just released, so this provides a deprecation period during
the 16.03 release.
closes#10518
Signed-off-by: Robin Gloster <mail@glob.in>
This is a regression introduced by merging the EBS and S3 images. The
EBS images had a special marker /.ebs to prevent the initrd from using
ephemeral storage for the unionfs, but this marker was missing in the
consolidated image.
The fix is to check the file ami-manifest-path on the metadata server
to see if we're an S3-based instance. This does require networking in
the initrd.
Issue #12613.
The default behavior with an m3.medium instance is to relocate
/nix and /tmp to /disk0 because an assumption is made that any
ephemeral disk is larger than the root volume. Rather than make
that assumption, add a check to see if the disk is larger, and
only then relocate /nix and /tmp.
This addresses https://github.com/NixOS/nixpkgs/issues/12613
See http://nixos.org/nixpkgs/manual/#sec-package-naming
I've added an alias for multipath_tools to make sure that we don't break
existing configurations referencing the old name.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
When using `--ensure-unique-name`, don't needlessly append `"-0"` if the
container name is already unique.
This is especially helpful with NixOps since when it deploys to a
container it uses `--ensure-unique-name`. This means that the container
name will never match the deployment host due to the `"-0"`. Having the
container name and the host name match isn't exactly a requirement, but
it's nice to have and a small change.
- add missing types in module definitions
- add missing 'defaultText' in module definitions
- wrap example with 'literalExample' where necessary in module definitions
Modifies libvirt package to search for configs in /var/lib and changes
libvirtd service to copy the default configs to the new location.
This enables the user to change e.g. the networking configuration with
virsh or virt-manager and keep those settings.
This reverts commit 6353f580f9.
Unfortunately cache=none doesn't work with all filesystem options.
Hydra tests error out with: file system may not support O_DIRECT
See http://hydra.nixos.org/build/30323625/
Setting nixosVersion to something custom is useful for meaningful GRUB
menus and /nix/store paths, but actuallly changing it rebulids the
whole system path (because of `nixos-version` script and manual
pages). Also, changing it is not a particularly good idea because you
can then be differentitated from other NixOS users by a lot of
programs that read /etc/os-release.
This patch introduces an alternative option that does all you want
from nixosVersion, but rebuilds only the very top system level and
/etc while using your label in the names of system /nix/store paths,
GRUB and other boot loaders' menus, getty greetings and so on.
The docker module used different code for socket-activated docker daemon than for the non-socket activated daemon.
In particular, if the socket-activated daemon is used, then modprobe wasn't set up to be usable and in PATH for
the docker daemon, which resulted in a failure to start the daemon with overlayfs as storageDriver if the
`overlay` kernel module wasn't already loaded. This commit fixes that bug (which only appears if socket
activation is used), and also reduces the duplication between code paths so that it's easier to keep
both in sync in future.
As @domenkozar noted in #10828, cache=writeback seems to do more harm
than good:
https://github.com/NixOS/nixpkgs/issues/10828#issuecomment-164426821
He has tested it using the openstack NixOS tests and found that
cache=none significantly improves startup performance.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This seems to be the root cause of the random page allocation failures
and @wizeman did a very good job on not only finding the root problem
but also giving a detailed explanation of it in #10828.
Here is an excerpt:
The problem here is that the kernel is trying to allocate a contiguous
section of 2^7=128 pages, which is 512 KB. This is way too much:
kernel pages tend to get fragmented over time and kernel developers
often go to great lengths to try allocating at most only 1 contiguous
page at a time whenever they can.
From the error message, it looks like the culprit is unionfs, but this
is misleading: unionfs is the name of the userspace process that was
running when the system ran out of memory, but it wasn't unionfs who
was allocating the memory: it was the kernel; specifically it was the
v9fs_dir_readdir_dotl() function, which is the code for handling the
readdir() function in the 9p filesystem (the filesystem that is used
to share a directory structure between a qemu host and its VM).
If you look at the code, here's what it's doing at the moment it tries
to allocate memory:
buflen = fid->clnt->msize - P9_IOHDRSZ;
rdir = v9fs_alloc_rdir_buf(file, buflen);
If you look into v9fs_alloc_rdir_buf(), you will see that it will try
to allocate a contiguous buffer of memory (using kzalloc(), which is a
wrapper around kmalloc()) of size buflen + 8 bytes or so.
So in reality, this code actually allocates a buffer of size
proportional to fid->clnt->msize. What is this msize? If you follow
the definition of the structures, you will see that it's the
negotiated buffer transfer size between 9p client and 9p server. On
the client side, it can be controlled with the msize mount option.
What this all means is that, the reason for running out of memory is
that the code (which we can't easily change) tries to allocate a
contiguous buffer of size more or less equal to "negotiated 9p
protocol buffer size", which seems to be way too big (in our NixOS
tests, at least).
After that initial finding, @lethalman tested the gnome3 gdm test
without setting the msize parameter at all and it seems to have resolved
the problem.
The reason why I'm committing this without testing against all of the
NixOS VM test is basically that I think we can only go better but not
worse than the current state.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>