This patch preserves the ordering of layers of a parent image when the
new image is packed.
It is currently not the case: layers are stacked in the reverse order.
Fixes#55290
Docker images used to be, essentially, a linked list of layers. Each
layer would have a tarball and a json document pointing to its parent,
and the image pointed to the top layer:
imageA ----> layerA
|
v
layerB
|
v
layerC
The current image spec changed this format to where the Image defined
the order and set of layers:
imageA ---> layerA
|--> layerB
`--> layerC
For backwards compatibility, docker produces images which follow both
specs: layers point to parents, and images also point to the entire
list:
imageA ---> layerA
| |
| v
|--> layerB
| |
| v
`--> layerC
This is nice for tooling which supported the older version and never
updated to support the newer format.
Our `buildImage` code only supported the old version, so in order for
`buildImage` to properly generate an image based on another image
with `fromImage`, the parent image's layers must fully support the old
mechanism.
This is not a problem in general, but is a problem with
`buildLayeredImage`.
`buildLayeredImage` creates images with newer image spec, because
individual store paths don't have a guaranteed parent layer. Including
a specific parent ID in the layer's json makes the output less likely
to cache hit when published or pulled.
This means until now, `buildLayeredImage` could not be the input to
`buildImage`.
The changes in this PR change `buildImage` to only use the layer's
manifest when locating parent IDs. This does break buildImage on
extremely old Docker images, though I do wonder how many of these
exist.
This work has been sponsored by Target.
Since Nix 2 is now the stable Nix version, we can use closureInfo
which simplifies the Nix database initialisation (size and hash are
included in the "dump").
Create a many-layered Docker Image.
Implements much less than buildImage:
- Doesn't support specific uids/gids
- Doesn't support runninng commands after building
- Doesn't require qemu
- Doesn't create mutable copies of the files in the path
- Doesn't support parent images
If you want those feature, I recommend using buildLayeredImage as an
input to buildImage.
Notably, it does support:
- Caching low level, common paths based on a graph traversial
algorithm, see referencesByPopularity in
0a80233487993256e811f566b1c80a40394c03d6
- Configurable number of layers. If you're not using AUFS or not
extending the image, you can specify a larger number of layers at
build time:
pkgs.dockerTools.buildLayeredImage {
name = "hello";
maxLayers = 128;
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
- Parallelized creation of the layers, improving build speed.
- The contents of the image includes the closure of the configuration,
so you don't have to specify paths in contents and config.
With buildImage, paths referred to by the config were not included
automatically in the image. Thus, if you wanted to call Git, you
had to specify it twice:
pkgs.dockerTools.buildImage {
name = "hello";
contents = [ pkgs.gitFull ];
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
buildLayeredImage on the other hand includes the runtime closure of
the config when calculating the contents of the image:
pkgs.dockerTools.buildImage {
name = "hello";
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
Minor Problems
- If any of the store paths change, every layer will be rebuilt in
the nix-build. However, beacuse the layers are bit-for-bit
reproducable, when these images are loaded in to Docker they will
match existing layers and not be imported or uploaded twice.
Common Questions
- Aren't Docker layers ordered?
No. People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.
- What happens if I have more than maxLayers of store paths?
The first (maxLayers-2) most "popular" paths will have their own
individual layers, then layer #(maxLayers-1) will contain all the
remaining "unpopular" paths, and finally layer #(maxLayers) will
contain the Image configuration.
Because dates are an impurity, by default buildImage will use a static
date of one second past the UNIX Epoch. This can be a bit frustrating
when listing docker images in the CLI:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
hello latest 08c791c7846e 48 years ago 25.2MB
If you want to trade the purity for a better user experience, you can
set created to now.
pkgs.dockerTools.buildImage {
name = "hello";
tag = "latest";
created = "now";
contents = pkgs.hello;
config.Cmd = [ "/bin/hello" ];
}
and now the Docker CLI will display a reasonable date and sort the
images as expected:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
hello latest de2bf4786de6 About a minute ago 25.2MB
docker-tools tests load images without specifying any tag
value. Docker then uses the image with tag "latest" which doesn't
exist anymore since commit 39e678e24e.
Attributes `imageName` and `imageTag` are exposed if the image is
built by our Nix tools but not if the image is pulled. So, we expose
these attributes for convenience and homogeneity.
Skopeo used by our docker tools was patched to work in the build
sandbox (it used /var/tmp which is not available in the sandbox).
Since this temporary directory can now be set at build time, we remove
the patch from our docker tools.
The extraCommands was, previously, simply put in the body of the script
using nix expansion `${extraCommands}` (which looks exactly like bash
expansion!).
This causes issues like in #34779 where scripts will eventually create
invalid bash.
The solution is to use a script like `run-as-root`.
* * *
Fixes#34779
Regression introduced in 736848723e.
This commit most certainly hasn't been tested with sandboxing enabled
and breaks not only pullImage but also the docker-tools NixOS VM test
because it doesn't find it's certificate path and also relies on
/var/tmp being there.
Fixing the certificate path is the easiest one because it can be done
via environment variable.
I've used overrideAttrs for changing the hardcoded path to /tmp (which
is available in sandboxed builds and even hardcoded in Nix), so that
whenever someone uses Skopeo from all-packages.nix the path is still
/var/tmp.
The reason why this is hardcoded to /var/tmp can be seen in a comment in
vendor/github.com/containers/image/storage/storage_image.go:
Do not use the system default of os.TempDir(), usually /tmp, because
with systemd it could be a tmpfs.
With sandboxed builds this isn't the case, however for using Nix without
NixOS this could turn into a problem if this indeed is the case.
So in the long term this needs to have a proper solution.
In addition to that, I cleaned up the expression a bit.
Tested by building dockerTools.examples.nixFromDockerHub and the
docker-tools NixOS VM test.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @nlewo, @Mic92, @Profpatsch, @globin, @LnL7
Skopeo is used to pull images from a Docker registry (instead of a
Docker deamon in a VM).
An image reference is specified with its name and its digest which is
an immutable image identifier (unlike image name and tag).
Skopeo can be used to get the digest of an image, for instance:
$ skopeo inspect docker://docker.io/nixos/nix:1.11 | jq -r '.Digest'
This is to go to a reproducible image build.
Note without this options image are identical from the Docker point of
view but generated docker archives could have different hashes.
This is to improve image creation reproducibility. Since the nar
format doesn't support hard link, the tar stream of a layer can be
different if a dependency of a layer has been built locally or if it
has been fetched from a binary cache.
If the dependency has been build locally, it can contain hard links
which are encoded in the tar stream. If the dependency has been
fetched from a binary cache, the tar stream doesn't contain any hard
link. So even if the content is the same, tar streams are different.
We were using 'Combined Image JSON + Filesystem Changeset Format' [1] to
unpack and pack image and this patch switches to the format used by the registry.
We used the 'repository' file which is not generated by Skopeo when it
pulls an image. Moreover, all information of this file are also in the
manifest.json file.
We then use the manifest.json file instead of 'repository' file. Note
also the manifest.json file is required to push an image with Skopeo.
Fix#29636
[1] 749d90e10f/image/spec/v1.1.md (combined-image-json--filesystem-changeset-format)
The database dump doesn't contain sha and size. This leads to invalid
path in the container. We have to fix the database by using
nix-store.
Note a better way to do this is available in Nix 1.12 (since the
database dump contains all required information).
We also add content output paths in the gcroots since they ca be used
by the container.
Currently, the contents closure is copied to the layer but there is no
nix database initialization. If pkgs.nix is added in the contents,
nix-store doesn't work because there is no nix database.
From the contents of the layer, this commit generates and loads the
database in the nix store of the container. This only works if there
is no parent layer that already have a nix store (to support several
nix layers, we would have to merge nix databases of parent layers).
We also add an example to play with the nix store inside the
container. Note it seems `more` is a missing dependency of the nix
package!
Before this patch, a VM was used to spawn docker that pulled the
VM. Now, the tool Skopeo does this job well so we can simplify our
dockerTools since we doesn't need Docker anymore:)
This also fixe the regression described in
https://github.com/NixOS/nixpkgs/issues/29271 : cntlm proxy doesn't
work in 17.09 while it worked in 17.03.
Note Skopeo doesn't produce the same output than docker pull so, we
have to update sha.
To wait for the docker deamon, curl requests are sent. However, if a
http proxy is set, it will respond instead of the docker daemon.
To avoid this, we send docker ps command instead of curl command.
The image json is not exactly the same as the layer json, therefore I
changed the implementation to use the `baseJson` which doesn’t include
layer specific details like `id`, `size` or the checksum of the layer.
Also the `history` entry was missing in the image json. I’m not totally
sure if this field is required, but a I got an error from a docker
registry when I’ve tried to receive the distribution manifest of an
image without those `history` entry:
GET: `http://<registry-host>/v2/<imageName>/manifests/<imageTag>`
```json
{
"errors": [
{
"code": "MANIFEST_INVALID",
"message": "manifest invalid",
"detail": {}
}
]
}
```
I’ve also used a while loop to iterate over all layers which should make
sure that the order of the layers is correct. Previously `find` was
used and I’m not sure if the order was always correct.
If the base image has been built with nixpkgs.dockerTools, the image
configuration and manifest are readonly so we first need to change
their permissions before removing them.
Fix#27632.
The docker loading (docker 1.12.6) of an image with uppercase in the
name fails with the following message:
invalid reference format: repository name must be lowercase
This patch fixes file modification times to $SOURCE_DATE_EPOCH, and
ensures that files originating from the store are owned by root:root.
Both changes improve reproducibility, and the latter allows proper
building on a host where the store is owned by a non-root user.
When building an image with multiple layers, files
already included in an underlying layer are supposed to
be excluded from the current layer. However, some subtleties
in the way filepaths are compared seem to be blocking this.
Specifically:
* tar generates relative filepaths with directories ending in '/'
* find generates absolute filepaths with no trailing slashes on directories
That is, paths extracted from the underlying tarball look like:
nix/store/.../foobar/
whereas the layer being generated uses paths like:
/nix/store/.../foobar
This patch modifies the output of "tar -t" to match the latter format.
The /nix path in 4d200538 of the layer tar didn't exist for some
packages, such as cacert. This is because cacert just creates an /etc
directory and doesn't depend on any other /nix paths. If we tried
putting this directory in the tar and using overlayfs with it, we'd get
"Invalid argument" when trying to remove the directory.
We now check whether the closure is non-empty before telling tar to
store the /nix directory.
Fixes#14710.
There were two sources of non-determinisim coming into the images. The
first was tar mtimes, the second was pigz/gzip times.
An example image now passes with the --check flag.
I think the intention of this functionality was to provide a simple
alternative to the "runAsRoot" and "contents" attributes.
The implementation caused very slow builds of Docker images. Almost all
of the build time was spent in IO for tar, due to tarballs being
created, immediately extracted, then recreated. I had 30 minute builds
on some of my images which are now down to less than 2 minutes. A couple
of other users on #nix IRC have observed similar improvements.
The implementation also mutated the produced Docker layers without
changing their hashes. Using non-empty tarballs would produce images
which got cached incorrectly in Docker.
I have a commit which just fixes the performance problem but I opted to
completely remove the tarball feature after I found out that it didn't
correctly implement the Docker Image Specification due to the broken
hashing.
* authorization token is optional
* registry url is taken from X-Docker-Endpoints header
* pull.sh correctly resumes partial layer downloads
* detjson.py does not fail on missing keys