The docker module used different code for socket-activated docker daemon than for the non-socket activated daemon.
In particular, if the socket-activated daemon is used, then modprobe wasn't set up to be usable and in PATH for
the docker daemon, which resulted in a failure to start the daemon with overlayfs as storageDriver if the
`overlay` kernel module wasn't already loaded. This commit fixes that bug (which only appears if socket
activation is used), and also reduces the duplication between code paths so that it's easier to keep
both in sync in future.
As @domenkozar noted in #10828, cache=writeback seems to do more harm
than good:
https://github.com/NixOS/nixpkgs/issues/10828#issuecomment-164426821
He has tested it using the openstack NixOS tests and found that
cache=none significantly improves startup performance.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This seems to be the root cause of the random page allocation failures
and @wizeman did a very good job on not only finding the root problem
but also giving a detailed explanation of it in #10828.
Here is an excerpt:
The problem here is that the kernel is trying to allocate a contiguous
section of 2^7=128 pages, which is 512 KB. This is way too much:
kernel pages tend to get fragmented over time and kernel developers
often go to great lengths to try allocating at most only 1 contiguous
page at a time whenever they can.
From the error message, it looks like the culprit is unionfs, but this
is misleading: unionfs is the name of the userspace process that was
running when the system ran out of memory, but it wasn't unionfs who
was allocating the memory: it was the kernel; specifically it was the
v9fs_dir_readdir_dotl() function, which is the code for handling the
readdir() function in the 9p filesystem (the filesystem that is used
to share a directory structure between a qemu host and its VM).
If you look at the code, here's what it's doing at the moment it tries
to allocate memory:
buflen = fid->clnt->msize - P9_IOHDRSZ;
rdir = v9fs_alloc_rdir_buf(file, buflen);
If you look into v9fs_alloc_rdir_buf(), you will see that it will try
to allocate a contiguous buffer of memory (using kzalloc(), which is a
wrapper around kmalloc()) of size buflen + 8 bytes or so.
So in reality, this code actually allocates a buffer of size
proportional to fid->clnt->msize. What is this msize? If you follow
the definition of the structures, you will see that it's the
negotiated buffer transfer size between 9p client and 9p server. On
the client side, it can be controlled with the msize mount option.
What this all means is that, the reason for running out of memory is
that the code (which we can't easily change) tries to allocate a
contiguous buffer of size more or less equal to "negotiated 9p
protocol buffer size", which seems to be way too big (in our NixOS
tests, at least).
After that initial finding, @lethalman tested the gnome3 gdm test
without setting the msize parameter at all and it seems to have resolved
the problem.
The reason why I'm committing this without testing against all of the
NixOS VM test is basically that I think we can only go better but not
worse than the current state.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
The most complex problems were from dealing with switches reverted in
the meantime (gcc5, gmp6, ncurses6).
It's likely that darwin is (still) broken nontrivially.
Commit 9bfe92ecee ("docker: Minor improvements, fix failing test") added
the services.docker.storageDriver option, made it mandatory but didn't
give it a default value. This results in an ugly traceback when users
enable docker, if they don't pay enough attention to also set the
storageDriver option. (An attempt was made to add an assertion, but it
didn't work, possibly because of how "mkMerge" works.)
The arguments against a default value were that the optimal value
depends on the filesystem on the host. This is, AFAICT, only in part
true. (It seems some backends are filesystem agnostic.) Also, docker
itself uses a default storage driver, "devicemapper", when no
--storage-driver=x options are given. Hence, we use the same value as
default.
Add a FIXME comment that 'devicemapper' breaks NixOS VM tests (for yet
unknown reasons), so we still run those with the 'overlay' driver.
Closes#10100 and #10217.
When using the ZFS storagedriver in docker, it shells out for the ZFS
commands. The path configuration for the systemd task does not include
ZFS, so if the driver is set to ZFS, add ZFS utilities to the PATH.
This will resolve https://github.com/NixOS/nixpkgs/issues/10127
[Bjørn: prefix commit message with "nixos/docker:", remove extra space
before ';']
The EBS and S3 (instance-store) AMIs are now created from the same
image. HVM instance-store AMIs are also generated.
Disk image generation has been factored out into a function
(nixos/lib/make-disk-image.nix) that can be used to build other kinds
of images.
Booting the demo/installer image won't work if the video memory is too
low. It boots into KDE, shows the background image and doesn't do
anything, according to @domenkozar.
Thanks to @domenkozar for reporting and testing this with 32MB.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
pkgs/os-specific/linux/kernel/common-config.nix defines HIGHMEM64G on
line 441 for 32bit systems, which implies PAE.
We now creating the OVA with PAE support enabled, which fixes bootup of
the image if people are just importing it without setting PAE
explicitly.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Commit 687caeb renamed services.virtualboxHost to programs.virtualbox,
but according to the discussion on the commit, it's probably a better to
put it into virtualisation.virtualbox instead.
The discussion can be found here:
https://github.com/NixOS/nixpkgs/commit/687caeb#commitcomment-12664978
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This removes all references to .../sbin for the guest additions and also
installs all binaries to .../bin instead (so no more .../sbin).
The main motivation for doing this is commit 98cedb3 (which
unfortunately had to be reverted in a9f2e10) and pull request #9063,
where the latter is an initial effort to move mount.vboxsf to .../bin
instead of .../sbin.
The commit I made afterwards is finishing the removal of .../sbin
entirely.
In 14f09e0, I've introduced the module under modules/programs, because
the legacy virtualbox.nix was also under that path. But because we
already have modules/virtualisation/virtualbox-guest.nix, it really
makes sense to put this module alongside of it as well.
This module thus has no change in functionality and I've tested
evaluation against nixos/tests/virtualbox.nix and the manual.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Using $storepath/sbin is deprecated according to commit 98cedb3, so
let's avoid putting anything in .../sbin for the guest additions.
This is a continuation of the initial commit done by @ctheune at
1fb1360, which unfortunately broke VM tests and only changed the path of
the mount.vboxsf helper.
With this commit, the VM test is fixed and I've also verified on my
machine that it is indeed working again.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Xen required a few changes in order to be usable:
* Include xenfs module in initrd as loading it in the activation
script was failing.
* Include /etc/default/xendomains, which is needed by
xen-domains service.
* Create /var/log/xen and /var/lib/xen directories in
the xen-store service, which are needed by the xl command.
The directories could be created by any other script as long as
they are guaranteed to exist before xl is called.
* Fix a reference to /bin/ls in the xendomains script.
We already have separate tests for checking whether the ISO boots
correctly, so it's not necessary to do that here. So now
tests/installer.nix just tests nixos-install, from a regular NixOS VM
that uses the host's Nix store. This makes running the tests more
convenient because we don't have to build a new ISO after every
change.
The issue was that grub was not building the default entry which would
leave systems unbootable. This can now be safely reverted as the default
entry is being built once again.
This reverts commit fd1fb0403c.
If the host is shutting down, machinectl may fail because it's
bus-activated and D-Bus will be shutting down. So just send a signal
to the leader process directly.
Fixes#6212.
The Nixos Qemu VM that are used for VM tests can now start without
boot menu even when using a bootloader.
The Nixos Qemu VM with bootloader can emulate a EFI boot now.
- Create container nixos profile
- Create lxc-container nixos config using container nixos profile
- Docker nixos image, use nixos profile for its base config
Especially new users could be confused by this, so we're now marking
services.virtualbox.enable as obsolete and defaulting to
services.virtualboxGuest.enable instead. I believe this now makes it
clear, that this option is for guest additions only.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Regression introduced in f496c3cbe4.
Previously when we used security.initialRootPassword, the default
priority for this option was 1001, because it was a default value set by
the option itself.
With the mentioned commit, it is no longer an option default but a
mkDefault, which is priority 1000.
I'm setting this to 150 now, as test-instrumentation.nix is using this
for overriding other options and because I think it still makes it
possible to simple-override it, because if no priority is given, we get
priority 100.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This fixes the issue when the LXC emulator binary is garbage collected
and breaks libvirtd containers, because libvirtd XML file still refers
to GC'ed store path.
We already have a fix for QEMU, this commit extends the fix to cover LXC
too.
This tells the sad tale of @the-kenny who had bind-mounted his home
directory into a container. After doing `nixos-container destroy` he
discovered that his home directory went from "full of precious data" to
"no more data".
We want to avoid having similar sad tales in the future, so this now also
check this in the containers VM test.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Fixes a leftover from 330fadb706.
We're using systemd dbus notifications now and this leftover caused the
startup notification to fail.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This allows creating a container from an existing system store path,
which is especially nice for NixOps-deployed hosts because they don't
need a Nixpkgs tree anymore.
Systemd in a container will call sd_notify when it has finished
booting, so we can use that to signal that the container is
ready. This does require some fiddling with $NOTIFY_SOCKET.
Previously "machinectl reboot/poweroff" brutally killed the container,
as did "systemctl stop/restart". And reboot didn't actually work. Now
everything is fine.
Previously "machinectl reboot/poweroff" brutally killed the container,
as did "systemctl stop/restart". And reboot didn't actually work. Now
everything is fine.
curl does not retry if it is unable to connect to the metadata server.
For some reason, when creating a new AMI with a recent nixpkgs, the
metadata server would not be available when fetch-ec2-data ran. Switching
to wget that can retry even on TCP connection errors solved this problem.
I also made the fetch-ec2-data depend on ip-up.target, to get it to start
a bit later.
Removed the 'wait for GCE metadata service' job, as it was causing
issues with the metadata service (likely some firewall or something).
In stead, use wget with retries (including connection refused) in
stead or curl for fetching the SSH keys. Also made the stdout/-err
of this job appear in the console.
This version of module has disabled socketActivation, because until
nixos upgrade systemd to at least 214, systemd does not support
SocketGroup. So socket is created with "root" group when
socketActivation enabled. Should be fixed as soon as systemd upgraded.
Includes changes from #3015 and supersedes #3028
By setting a line like
MACVLANS="eno1"
in /etc/containers/<name>.conf, the container will get an Ethernet
interface named mv-eno1, which represents an additional MAC address on
the physical eno1 interface. Thus the container has direct access to
the physical network. You can specify multiple interfaces in MACVLANS.
Unfortunately, you can't do this with wireless interfaces.
Note that dhcpcd is disabled in containers by default, so you'll
probably want to set
networking.useDHCP = true;
in the container, or configure a static IP address.
To do: add a containers.* option for this, and a flag for
"nixos-container create".
Fixes#2379.
The new name was a misnomer because the values really are X11 video
drivers (e.g. ‘cirrus’ or ‘nvidia’), not OpenGL implementations. That
it's also used to set an OpenGL implementation for kmscon is just
confusing overloading.
By default, socat only waits 0.5s for the remote side to finish after
getting EOF on the local side. So don't close the local side, instead
wait for socat to exit when the remote side finishes.
http://hydra.nixos.org/build/10663282
By enabling ‘services.openssh.startWhenNeeded’, sshd is started
on-demand by systemd using socket activation. This is particularly
useful if you have a zillion containers and don't want to have sshd
running permanently. Note that socket activation is not noticeable
slower, contrary to what the manpage for ‘sshd -i’ says, so we might
want to make this the default one day.
The ability for unprivileged users to mount external media is useful
regardless of the desktop environment. Also, since udisks2 is
activated on-demand, it doesn't add any overhead if you're not using it.
Apparently systemd is now smart enough to figure out predictable names
for QEMU network interfaces. But since our tests expect them to be
named eth0/eth1..., this is not desirable at the moment.
http://hydra.nixos.org/build/10418789
This used to work with systemd-nspawn 203, because it bind-mounted
/etc/resolv.conf (so openresolv couldn't overwrite it). Now it's just
copied, so we need some special handling.
Using pkgs.lib on the spine of module evaluation is problematic
because the pkgs argument depends on the result of module
evaluation. To prevent an infinite recursion, pkgs and some of the
modules are evaluated twice, which is inefficient. Using ‘with lib’
prevents this problem.
Update VirtualBox (and implicitly VirtualBox Guest Additions) to 4.3.6
and Oracle VM VirtualBox Extension Pack to 91406
Conflicts due to minor upgrade in the mean time
Conflicts:
nixos/modules/virtualisation/virtualbox-guest.nix
pkgs/applications/virtualization/virtualbox/default.nix
pkgs/applications/virtualization/virtualbox/guest-additions/default.nix
Needed for the installer tests, since otherwise mounting a filesystem
may fail as it has a last-mounted date in the future.
http://hydra.nixos.org/build/9846712
The command nixos-container can now create containers. For instance,
the following creates and starts a container named ‘database’:
$ nixos-container create database
The configuration of the container is stored in
/var/lib/containers/<name>/etc/nixos/configuration.nix. After editing
the configuration, you can make the changes take effect by doing
$ nixos-container update database
The container can also be destroyed:
$ nixos-container destroy database
Containers are now executed using a template unit,
‘container@.service’, so the unit in this example would be
‘container@database.service’.
For example, the following sets up a container named ‘foo’. The
container will have a single network interface eth0, with IP address
10.231.136.2. The host will have an interface c-foo with IP address
10.231.136.1.
systemd.containers.foo =
{ privateNetwork = true;
hostAddress = "10.231.136.1";
localAddress = "10.231.136.2";
config =
{ services.openssh.enable = true; };
};
With ‘privateNetwork = true’, the container has the CAP_NET_ADMIN
capability, allowing it to do arbitrary network configuration, such as
setting up firewall rules. This is secure because it cannot touch the
interfaces of the host.
The helper program ‘run-in-netns’ is needed at the moment because ‘ip
netns exec’ doesn't quite do the right thing (it remounts /sys without
bind-mounting the original /sys/fs/cgroups).
These are stored on the host in
/nix/var/nix/{profiles,gcroots}/per-container/<container-name> to
ensure that container profiles/roots are not garbage-collected.
On the host, you can run
$ socat unix:<path-to-container>/var/lib/login.socket -,echo=0,raw
to get a login prompt. So this allows logging in even if the
container has no SSH access enabled.
You can also do
$ socat unix:<path-to-container>/var/lib/root-shell.socket -
to get a plain root shell. (This socket is only accessible by root,
obviously.) This makes it easy to execute commands in the container,
e.g.
$ echo reboot | socat unix:<path-to-container>/var/lib/root-shell.socket -
It is parameterized by a function that takes a name and evaluates to the
option type for the attribute of that name. Together with
submoduleWithExtraArgs, this subsumes nixosSubmodule.
This is a rather large commit that switches user/group creation from using
useradd/groupadd on activation to just generating the contents of /etc/passwd
and /etc/group, and then on activation merging the generated files with the
files that exist in the system. This makes the user activation process much
cleaner, in my opinion.
The users.extraUsers.<user>.uid and users.extraGroups.<group>.gid must all be
properly defined (if <user>.createUser is true, which it is by default). My
pull request adds a lot of uids/gids to config.ids to solve this problem for
existing nixos services, but there might be configurations that break because
this change. However, this will be discovered during the build.
Option changes introduced by this commit:
* Remove the options <user>.isSystemUser and <user>.isAlias since
they don't make sense when generating /etc/passwd statically.
* Add <group>.members as a complement to <user>.extraGroups.
* Add <user>.passwordFile for setting a user's password from an encrypted
(shadow-style) file.
* Add users.mutableUsers which is true by default. This means you can keep
managing your users as previously, by using useradd/groupadd manually. This is
accomplished by merging the generated passwd/group file with the existing files
in /etc on system activation. The merging of the files is simplistic. It just
looks at the user/group names. If a user/group exists both on the system and
in the generated files, the system entry will be kept un-changed and the
generated entries will be ignored. The merging itself is performed with the
help of vipw/vigr to properly lock the account files during edit.
If mutableUsers is set to false, the generated passwd and group files will not
be merged with the system files on activation. Instead they will simply replace
the system files, and overwrite any changes done on the running system. The
same logic holds for user password, if the <user>.password or
<user>.passwordFile options are used. If mutableUsers is false, password will
simply be replaced on activation. If true, the initial user passwords will be
set according to the configuration, but existing passwords will not be touched.
I have tested this on a couple of different systems and it seems to work fine
so far. If you think this is a good idea, please test it. This way of adding
local users has been discussed in issue #103 (and this commit solves that
issue).
libvirtd puts the full path of the emulator binary in the machine config
file. But this path can unfortunately be garbage collected while still
being used by the virtual machine. Then this happens:
Error starting domain: Cannot check QEMU binary /nix/store/z5c2xzk9x0pj6x511w0w4gy9xl5wljxy-qemu-1.5.2-x86-only/bin/qemu-kvm: No such file or directory
Fix by updating the emulator path on each service startup to something
valid (re-scan $PATH).
You can now say:
systemd.containers.foo.config =
{ services.openssh.enable = true;
services.openssh.ports = [ 2022 ];
users.extraUsers.root.openssh.authorizedKeys.keys = [ "ssh-dss ..." ];
};
which defines a NixOS instance with the given configuration running
inside a lightweight container.
You can also manage the configuration of the container independently
from the host:
systemd.containers.foo.path = "/nix/var/nix/profiles/containers/foo";
where "path" is a NixOS system profile. It can be created/updated by
doing:
$ nix-env --set -p /nix/var/nix/profiles/containers/foo \
-f '<nixos>' -A system -I nixos-config=foo.nix
The container configuration (foo.nix) should define
boot.isContainer = true;
to optimise away the building of a kernel and initrd. This is done
automatically when using the "config" route.
On the host, a lightweight container appears as the service
"container-<name>.service". The container is like a regular NixOS
(virtual) machine, except that it doesn't have its own kernel. It has
its own root file system (by default /var/lib/containers/<name>), but
shares the Nix store of the host (as a read-only bind mount). It also
has access to the network devices of the host.
Currently, if the configuration of the container changes, running
"nixos-rebuild switch" on the host will cause the container to be
rebooted. In the future we may want to send some message to the
container so that it can activate the new container configuration
without rebooting.
Containers are not perfectly isolated yet. In particular, the host's
/sys/fs/cgroup is mounted (writable!) in the guest.
Fixes this:
Nov 09 16:18:54 nixos-laptop systemd[1]: Starting Libvirt Virtual Machine Management Daemon...
Nov 09 16:18:54 nixos-laptop dnsmasq[15809]: read /etc/hosts - 2 addresses
Nov 09 16:18:54 nixos-laptop dnsmasq[15809]: failed to load names from /var/lib/libvirt/dnsmasq/default.addnhosts: Permission denied
Nov 09 16:18:54 nixos-laptop dnsmasq[15809]: cannot read /var/lib/libvirt/dnsmasq/default.hostsfile: Permission denied
Nov 09 16:18:55 nixos-laptop systemd[1]: Started Libvirt Virtual Machine Management Daemon.
I don't understand the reason for the original 700 permission bits.
Apparently read-access is needed and Ubuntu also use 755 perms.
Use "chmod" instead of "mkdir -m" to set permissions because mkdir doesn't
modify permissions on existing directories.
(systemd service descriptions that is, not service descriptions in "man
configuration.nix".)
Capitalizing each word in the description seems to be the accepted
standard.
Also shorten these descriptions:
* "Munin node, the agent process" => "Munin Node"
* "Planet Venus, an awesome ‘river of news’ feed reader" => "Planet Venus Feed Reader"
A null password allows logging into local PAM services such as "login"
(agetty) and KDM. That's not actually a security problem for EC2
machines, since they do not have "local" logins; for VirtualBox
machines, if you local access, you can do anything anyway. But it's
better to be on the safe side and disable password-based logins for
root.
Now that overriding fileSystems in qemu-vm.nix works again, it's
important that the VM tests that add additional file systems use the
same override priority. Instead of using the same magic constant
everywhere, they can now use mkVMOverride.
http://hydra.nixos.org/build/6695561
Virsh/virt-manager uses ssh to connect to master, there it expects openbsd netcat(which
has support for unix sockets) to be avalible, to make a tunnel.
Close#1087.
It requires a writable /nix/store to store the build result. Also,
wait until we've reached multi-user.target before doing the build, and
do a sync at the end to ensure all data to $out is properly written.
http://hydra.nixos.org/build/6496716