If run as root we were leaking mounts to the parent namespace,
which lead to an error when removing the temporary mountroot.
To fix this we remount the whole tree as private as soon as we created
the new mountenamespace.
To avoid symlink loops to /host in nested chrootenvs we need to remove
one level of indirection. This is also what's generally expected of
/host contents.
* Remove unused argument from pivot_root;
* Factor out tmpdir creation into a separate function;
* Remove unused fstype from bind mount;
* Use unlink instead of a treewalk to remove empty temporary directory.
The problem with stacking chrootenv before was that CLONE_NEWUSER cannot
be used when a child uses chroot. So instead of that we use pivot_root
which replaces root in the whole namespace. This requires our new root
to be an actual fs so we mount tmpfs.
glibc 2.27 (and possibly other versions) can't handle an `nopenfd` value larger than 2^19 in `ntfw`, which is problematic if you've set the maximum number of fds per process to a value higher than that.
Changes:
* doesn't handle root user separately
* doesn't chdir("/") which makes using it seamless
* only bind mounts, doesn't symlink (i.e. files)
Incidentally, fixes#33106.
It's about two times shorter than the previous version, and much
easier to read/follow through. It uses GLib quite heavily, along with
RAII (available in GCC/Clang).