2020-06-19 19:27:46 +00:00
|
|
|
{ config, lib, pkgs, options, ... }:
|
2015-12-06 15:55:09 +00:00
|
|
|
let
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
|
|
|
|
|
2015-12-11 16:42:17 +00:00
|
|
|
cfg = config.security.acme;
|
2021-12-05 19:40:24 +00:00
|
|
|
opt = options.security.acme;
|
2021-12-04 19:01:18 +00:00
|
|
|
user = if cfg.useRoot then "root" else "acme";
|
2015-12-06 15:55:09 +00:00
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
# Used to calculate timer accuracy for coalescing
|
2024-08-29 22:46:31 +00:00
|
|
|
numCerts = lib.length (builtins.attrNames cfg.certs);
|
2020-06-19 19:27:46 +00:00
|
|
|
_24hSecs = 60 * 60 * 24;
|
|
|
|
|
2020-12-13 20:22:33 +00:00
|
|
|
# Used to make unique paths for each cert/account config set
|
2024-08-29 22:46:31 +00:00
|
|
|
mkHash = with builtins; val: lib.substring 0 20 (hashString "sha256" val);
|
2020-12-13 20:22:33 +00:00
|
|
|
mkAccountHash = acmeServer: data: mkHash "${toString acmeServer} ${data.keyType} ${data.email}";
|
|
|
|
accountDirRoot = "/var/lib/acme/.lego/accounts/";
|
|
|
|
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
lockdir = "/run/acme/";
|
|
|
|
concurrencyLockfiles = map (n: "${toString n}.lock") (lib.range 1 cfg.maxConcurrentRenewals);
|
|
|
|
# Assign elements of `baseList` to each element of `needAssignmentList`, until the latter is exhausted.
|
|
|
|
# returns: [{fst = "element of baseList"; snd = "element of needAssignmentList"}]
|
|
|
|
roundRobinAssign = baseList: needAssignmentList:
|
|
|
|
if baseList == [] then []
|
|
|
|
else _rrCycler baseList baseList needAssignmentList;
|
|
|
|
_rrCycler = with builtins; origBaseList: workingBaseList: needAssignmentList:
|
|
|
|
if (workingBaseList == [] || needAssignmentList == [])
|
|
|
|
then []
|
|
|
|
else
|
|
|
|
[{ fst = head workingBaseList; snd = head needAssignmentList;}] ++
|
|
|
|
_rrCycler origBaseList (if (tail workingBaseList == []) then origBaseList else tail workingBaseList) (tail needAssignmentList);
|
2024-08-29 22:46:31 +00:00
|
|
|
attrsToList = lib.mapAttrsToList (attrname: attrval: {name = attrname; value = attrval;});
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
# for an AttrSet `funcsAttrs` having functions as values, apply single arguments from
|
|
|
|
# `argsList` to them in a round-robin manner.
|
|
|
|
# Returns an attribute set with the applied functions as values.
|
|
|
|
roundRobinApplyAttrs = funcsAttrs: argsList: lib.listToAttrs (map (x: {inherit (x.snd) name; value = x.snd.value x.fst;}) (roundRobinAssign argsList (attrsToList funcsAttrs)));
|
|
|
|
wrapInFlock = lockfilePath: script:
|
|
|
|
# explainer: https://stackoverflow.com/a/60896531
|
|
|
|
''
|
|
|
|
exec {LOCKFD}> ${lockfilePath}
|
|
|
|
echo "Waiting to acquire lock ${lockfilePath}"
|
|
|
|
${pkgs.flock}/bin/flock ''${LOCKFD} || exit 1
|
|
|
|
echo "Acquired lock ${lockfilePath}"
|
|
|
|
''
|
|
|
|
+ script + "\n"
|
|
|
|
+ ''echo "Releasing lock ${lockfilePath}" # only released after process exit'';
|
|
|
|
|
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
# There are many services required to make cert renewals work.
|
|
|
|
# They all follow a common structure:
|
|
|
|
# - They inherit this commonServiceConfig
|
|
|
|
# - They all run as the acme user
|
|
|
|
# - They all use BindPath and StateDirectory where possible
|
|
|
|
# to set up a sort of build environment in /tmp
|
|
|
|
# The Group can vary depending on what the user has specified in
|
|
|
|
# security.acme.certs.<cert>.group on some of the services.
|
|
|
|
commonServiceConfig = {
|
2021-05-16 16:27:10 +00:00
|
|
|
Type = "oneshot";
|
2021-12-04 19:01:18 +00:00
|
|
|
User = user;
|
2024-08-29 22:46:31 +00:00
|
|
|
Group = lib.mkDefault "acme";
|
2022-10-28 15:23:44 +00:00
|
|
|
UMask = "0022";
|
|
|
|
StateDirectoryMode = "750";
|
2021-05-16 16:27:10 +00:00
|
|
|
ProtectSystem = "strict";
|
|
|
|
ReadWritePaths = [
|
|
|
|
"/var/lib/acme"
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
lockdir
|
2021-05-16 16:27:10 +00:00
|
|
|
];
|
|
|
|
PrivateTmp = true;
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2021-05-16 16:27:10 +00:00
|
|
|
WorkingDirectory = "/tmp";
|
|
|
|
|
|
|
|
CapabilityBoundingSet = [ "" ];
|
|
|
|
DevicePolicy = "closed";
|
|
|
|
LockPersonality = true;
|
|
|
|
MemoryDenyWriteExecute = true;
|
|
|
|
NoNewPrivileges = true;
|
|
|
|
PrivateDevices = true;
|
|
|
|
ProtectClock = true;
|
|
|
|
ProtectHome = true;
|
|
|
|
ProtectHostname = true;
|
|
|
|
ProtectControlGroups = true;
|
|
|
|
ProtectKernelLogs = true;
|
|
|
|
ProtectKernelModules = true;
|
|
|
|
ProtectKernelTunables = true;
|
|
|
|
ProtectProc = "invisible";
|
|
|
|
ProcSubset = "pid";
|
|
|
|
RemoveIPC = true;
|
|
|
|
RestrictAddressFamilies = [
|
|
|
|
"AF_INET"
|
|
|
|
"AF_INET6"
|
|
|
|
];
|
|
|
|
RestrictNamespaces = true;
|
|
|
|
RestrictRealtime = true;
|
|
|
|
RestrictSUIDSGID = true;
|
|
|
|
SystemCallArchitectures = "native";
|
|
|
|
SystemCallFilter = [
|
|
|
|
# 1. allow a reasonable set of syscalls
|
2022-10-24 12:57:25 +00:00
|
|
|
"@system-service @resources"
|
2021-05-16 16:27:10 +00:00
|
|
|
# 2. and deny unreasonable ones
|
2022-10-24 12:57:25 +00:00
|
|
|
"~@privileged"
|
2021-05-16 16:27:10 +00:00
|
|
|
# 3. then allow the required subset within denied groups
|
|
|
|
"@chown"
|
|
|
|
];
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
# In order to avoid race conditions creating the CA for selfsigned certs,
|
|
|
|
# we have a separate service which will create the necessary files.
|
|
|
|
selfsignCAService = {
|
|
|
|
description = "Generate self-signed certificate authority";
|
|
|
|
|
|
|
|
path = with pkgs; [ minica ];
|
|
|
|
|
|
|
|
unitConfig = {
|
|
|
|
ConditionPathExists = "!/var/lib/acme/.minica/key.pem";
|
2021-11-28 22:48:43 +00:00
|
|
|
StartLimitIntervalSec = 0;
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
serviceConfig = commonServiceConfig // {
|
|
|
|
StateDirectory = "acme/.minica";
|
|
|
|
BindPaths = "/var/lib/acme/.minica:/tmp/ca";
|
2022-10-28 15:23:44 +00:00
|
|
|
UMask = "0077";
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
# Working directory will be /tmp
|
|
|
|
script = ''
|
|
|
|
minica \
|
|
|
|
--ca-key ca/key.pem \
|
|
|
|
--ca-cert ca/cert.pem \
|
|
|
|
--domains selfsigned.local
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2020-12-13 22:19:53 +00:00
|
|
|
# Ensures that directories which are shared across all certs
|
|
|
|
# exist and have the correct user and group, since group
|
|
|
|
# is configurable on a per-cert basis.
|
2020-12-29 15:01:08 +00:00
|
|
|
userMigrationService = let
|
2021-01-09 19:15:03 +00:00
|
|
|
script = with builtins; ''
|
2021-12-04 19:01:18 +00:00
|
|
|
chown -R ${user} .lego/accounts
|
2024-08-29 22:46:31 +00:00
|
|
|
'' + (lib.concatStringsSep "\n" (lib.mapAttrsToList (cert: data: ''
|
|
|
|
for fixpath in ${lib.escapeShellArg cert} .lego/${lib.escapeShellArg cert}; do
|
2020-06-19 19:27:46 +00:00
|
|
|
if [ -d "$fixpath" ]; then
|
2020-10-22 13:04:31 +00:00
|
|
|
chmod -R u=rwX,g=rX,o= "$fixpath"
|
2021-12-04 19:01:18 +00:00
|
|
|
chown -R ${user}:${data.group} "$fixpath"
|
2020-06-19 19:27:46 +00:00
|
|
|
fi
|
|
|
|
done
|
2021-01-09 19:15:03 +00:00
|
|
|
'') certConfigs));
|
2020-12-29 15:01:08 +00:00
|
|
|
in {
|
|
|
|
description = "Fix owner and group of all ACME certificates";
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2020-12-29 15:01:08 +00:00
|
|
|
serviceConfig = commonServiceConfig // {
|
2020-12-13 22:19:53 +00:00
|
|
|
# We don't want this to run every time a renewal happens
|
|
|
|
RemainAfterExit = true;
|
|
|
|
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
# StateDirectory entries are a cleaner, service-level mechanism
|
|
|
|
# for dealing with persistent service data
|
2020-12-29 15:01:08 +00:00
|
|
|
StateDirectory = [ "acme" "acme/.lego" "acme/.lego/accounts" ];
|
|
|
|
StateDirectoryMode = 755;
|
|
|
|
WorkingDirectory = "/var/lib/acme";
|
|
|
|
|
|
|
|
# Run the start script as root
|
|
|
|
ExecStart = "+" + (pkgs.writeShellScript "acme-fixperms" script);
|
2020-12-13 22:19:53 +00:00
|
|
|
};
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
lockfilePrepareService = {
|
|
|
|
description = "Manage lock files for acme services";
|
|
|
|
|
|
|
|
# ensure all required lock files exist, but none more
|
|
|
|
script = ''
|
2024-08-29 22:46:31 +00:00
|
|
|
GLOBIGNORE="${lib.concatStringsSep ":" concurrencyLockfiles}"
|
2024-05-13 22:01:18 +00:00
|
|
|
rm -f -- *
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
unset GLOBIGNORE
|
|
|
|
|
|
|
|
xargs touch <<< "${toString concurrencyLockfiles}"
|
|
|
|
'';
|
|
|
|
|
|
|
|
serviceConfig = commonServiceConfig // {
|
|
|
|
# We don't want this to run every time a renewal happens
|
|
|
|
RemainAfterExit = true;
|
|
|
|
WorkingDirectory = lockdir;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
certToConfig = cert: data: let
|
2021-11-28 17:03:31 +00:00
|
|
|
acmeServer = data.server;
|
2020-06-19 19:27:46 +00:00
|
|
|
useDns = data.dnsProvider != null;
|
2023-10-25 19:08:05 +00:00
|
|
|
useDnsOrS3 = useDns || data.s3Bucket != null;
|
2020-06-19 19:27:46 +00:00
|
|
|
destPath = "/var/lib/acme/${cert}";
|
2024-08-29 22:46:31 +00:00
|
|
|
selfsignedDeps = lib.optionals (cfg.preliminarySelfsigned) [ "acme-selfsigned-${cert}.service" ];
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# Minica and lego have a "feature" which replaces * with _. We need
|
|
|
|
# to make this substitution to reference the output files from both programs.
|
|
|
|
# End users never see this since we rename the certs.
|
|
|
|
keyName = builtins.replaceStrings ["*"] ["_"] data.domain;
|
|
|
|
|
|
|
|
# FIXME when mkChangedOptionModule supports submodules, change to that.
|
|
|
|
# This is a workaround
|
|
|
|
extraDomains = data.extraDomainNames ++ (
|
2024-08-29 22:46:31 +00:00
|
|
|
lib.optionals
|
2020-06-19 19:27:46 +00:00
|
|
|
(data.extraDomains != "_mkMergedOptionModule")
|
|
|
|
(builtins.attrNames data.extraDomains)
|
|
|
|
);
|
|
|
|
|
|
|
|
# Create hashes for cert data directories based on configuration
|
2020-09-04 22:39:22 +00:00
|
|
|
# Flags are separated to avoid collisions
|
2020-06-19 19:27:46 +00:00
|
|
|
hashData = with builtins; ''
|
2024-08-29 22:46:31 +00:00
|
|
|
${lib.concatStringsSep " " data.extraLegoFlags} -
|
|
|
|
${lib.concatStringsSep " " data.extraLegoRunFlags} -
|
|
|
|
${lib.concatStringsSep " " data.extraLegoRenewFlags} -
|
2020-06-19 19:27:46 +00:00
|
|
|
${toString acmeServer} ${toString data.dnsProvider}
|
2020-09-04 22:39:22 +00:00
|
|
|
${toString data.ocspMustStaple} ${data.keyType}
|
2020-06-19 19:27:46 +00:00
|
|
|
'';
|
|
|
|
certDir = mkHash hashData;
|
2021-11-27 00:03:35 +00:00
|
|
|
# TODO remove domainHash usage entirely. Waiting on go-acme/lego#1532
|
2024-08-29 22:46:31 +00:00
|
|
|
domainHash = mkHash "${lib.concatStringsSep " " extraDomains} ${data.domain}";
|
2020-12-13 20:22:33 +00:00
|
|
|
accountHash = (mkAccountHash acmeServer data);
|
|
|
|
accountDir = accountDirRoot + accountHash;
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
protocolOpts = if useDns then (
|
|
|
|
[ "--dns" data.dnsProvider ]
|
2024-11-08 01:17:46 +00:00
|
|
|
++ lib.optionals (!data.dnsPropagationCheck) [ "--dns.propagation-disable-ans" ]
|
2024-08-29 22:46:31 +00:00
|
|
|
++ lib.optionals (data.dnsResolver != null) [ "--dns.resolvers" data.dnsResolver ]
|
2023-10-25 19:08:05 +00:00
|
|
|
) else if data.s3Bucket != null then [ "--http" "--http.s3-bucket" data.s3Bucket ]
|
|
|
|
else if data.listenHTTP != null then [ "--http" "--http.port" data.listenHTTP ]
|
2021-05-21 08:07:24 +00:00
|
|
|
else [ "--http" "--http.webroot" data.webroot ];
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
commonOpts = [
|
|
|
|
"--accept-tos" # Checking the option is covered by the assertions
|
|
|
|
"--path" "."
|
|
|
|
"-d" data.domain
|
|
|
|
"--email" data.email
|
|
|
|
"--key-type" data.keyType
|
|
|
|
] ++ protocolOpts
|
2024-08-29 22:46:31 +00:00
|
|
|
++ lib.optionals (acmeServer != null) [ "--server" acmeServer ]
|
|
|
|
++ lib.concatMap (name: [ "-d" name ]) extraDomains
|
2020-06-19 19:27:46 +00:00
|
|
|
++ data.extraLegoFlags;
|
|
|
|
|
2020-10-06 20:52:49 +00:00
|
|
|
# Although --must-staple is common to both modes, it is not declared as a
|
|
|
|
# mode-agnostic argument in lego and thus must come after the mode.
|
2024-08-29 22:46:31 +00:00
|
|
|
runOpts = lib.escapeShellArgs (
|
2020-06-19 19:27:46 +00:00
|
|
|
commonOpts
|
|
|
|
++ [ "run" ]
|
2024-08-29 22:46:31 +00:00
|
|
|
++ lib.optionals data.ocspMustStaple [ "--must-staple" ]
|
2020-06-19 19:27:46 +00:00
|
|
|
++ data.extraLegoRunFlags
|
|
|
|
);
|
2024-08-29 22:46:31 +00:00
|
|
|
renewOpts = lib.escapeShellArgs (
|
2020-06-19 19:27:46 +00:00
|
|
|
commonOpts
|
2022-10-04 21:28:23 +00:00
|
|
|
++ [ "renew" "--no-random-sleep" ]
|
2024-08-29 22:46:31 +00:00
|
|
|
++ lib.optionals data.ocspMustStaple [ "--must-staple" ]
|
2020-06-19 19:27:46 +00:00
|
|
|
++ data.extraLegoRenewFlags
|
|
|
|
);
|
|
|
|
|
2021-09-24 11:09:37 +00:00
|
|
|
# We need to collect all the ACME webroots to grant them write
|
|
|
|
# access in the systemd service.
|
|
|
|
webroots =
|
|
|
|
lib.remove null
|
|
|
|
(lib.unique
|
|
|
|
(builtins.map
|
|
|
|
(certAttrs: certAttrs.webroot)
|
|
|
|
(lib.attrValues config.security.acme.certs)));
|
2020-06-19 19:27:46 +00:00
|
|
|
in {
|
2020-12-13 22:19:53 +00:00
|
|
|
inherit accountHash cert selfsignedDeps;
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
group = data.group;
|
|
|
|
|
|
|
|
renewTimer = {
|
|
|
|
description = "Renew ACME Certificate for ${cert}";
|
|
|
|
wantedBy = [ "timers.target" ];
|
|
|
|
timerConfig = {
|
2021-11-28 17:03:31 +00:00
|
|
|
OnCalendar = data.renewInterval;
|
2020-06-19 19:27:46 +00:00
|
|
|
Unit = "acme-${cert}.service";
|
|
|
|
Persistent = "yes";
|
|
|
|
|
|
|
|
# Allow systemd to pick a convenient time within the day
|
|
|
|
# to run the check.
|
|
|
|
# This allows the coalescing of multiple timer jobs.
|
|
|
|
# We divide by the number of certificates so that if you
|
|
|
|
# have many certificates, the renewals are distributed over
|
|
|
|
# the course of the day to avoid rate limits.
|
|
|
|
AccuracySec = "${toString (_24hSecs / numCerts)}s";
|
|
|
|
# Skew randomly within the day, per https://letsencrypt.org/docs/integration-guide/.
|
|
|
|
RandomizedDelaySec = "24h";
|
2022-10-04 21:28:23 +00:00
|
|
|
FixedRandomDelay = true;
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
};
|
|
|
|
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
selfsignService = lockfileName: {
|
2020-06-19 19:27:46 +00:00
|
|
|
description = "Generate self-signed certificate for ${cert}";
|
2024-08-29 22:46:31 +00:00
|
|
|
after = [ "acme-selfsigned-ca.service" "acme-fixperms.service" ] ++ lib.optional (cfg.maxConcurrentRenewals > 0) "acme-lockfiles.service";
|
|
|
|
requires = [ "acme-selfsigned-ca.service" "acme-fixperms.service" ] ++ lib.optional (cfg.maxConcurrentRenewals > 0) "acme-lockfiles.service";
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
path = with pkgs; [ minica ];
|
|
|
|
|
|
|
|
unitConfig = {
|
|
|
|
ConditionPathExists = "!/var/lib/acme/${cert}/key.pem";
|
2021-11-28 22:48:43 +00:00
|
|
|
StartLimitIntervalSec = 0;
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
serviceConfig = commonServiceConfig // {
|
|
|
|
Group = data.group;
|
2022-10-28 15:23:44 +00:00
|
|
|
UMask = "0027";
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
StateDirectory = "acme/${cert}";
|
|
|
|
|
2020-12-29 15:01:08 +00:00
|
|
|
BindPaths = [
|
|
|
|
"/var/lib/acme/.minica:/tmp/ca"
|
|
|
|
"/var/lib/acme/${cert}:/tmp/${keyName}"
|
|
|
|
];
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
# Working directory will be /tmp
|
|
|
|
# minica will output to a folder sharing the name of the first domain
|
|
|
|
# in the list, which will be ${data.domain}
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
script = (if (lockfileName == null) then lib.id else wrapInFlock "${lockdir}${lockfileName}") ''
|
2020-06-19 19:27:46 +00:00
|
|
|
minica \
|
|
|
|
--ca-key ca/key.pem \
|
|
|
|
--ca-cert ca/cert.pem \
|
2024-08-29 22:46:31 +00:00
|
|
|
--domains ${lib.escapeShellArg (builtins.concatStringsSep "," ([ data.domain ] ++ extraDomains))}
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# Create files to match directory layout for real certificates
|
|
|
|
cd '${keyName}'
|
|
|
|
cp ../ca/cert.pem chain.pem
|
|
|
|
cat cert.pem chain.pem > fullchain.pem
|
|
|
|
cat key.pem fullchain.pem > full.pem
|
|
|
|
|
|
|
|
# Group might change between runs, re-apply it
|
2024-05-13 22:01:18 +00:00
|
|
|
chown '${user}:${data.group}' -- *
|
2021-05-04 23:27:19 +00:00
|
|
|
|
|
|
|
# Default permissions make the files unreadable by group + anon
|
|
|
|
# Need to be readable by group
|
2024-05-13 22:01:18 +00:00
|
|
|
chmod 640 -- *
|
2020-06-19 19:27:46 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
renewService = lockfileName: {
|
2020-06-19 19:27:46 +00:00
|
|
|
description = "Renew ACME certificate for ${cert}";
|
2024-08-29 22:46:31 +00:00
|
|
|
after = [ "network.target" "network-online.target" "acme-fixperms.service" "nss-lookup.target" ] ++ selfsignedDeps ++ lib.optional (cfg.maxConcurrentRenewals > 0) "acme-lockfiles.service";
|
|
|
|
wants = [ "network-online.target" "acme-fixperms.service" ] ++ selfsignedDeps ++ lib.optional (cfg.maxConcurrentRenewals > 0) "acme-lockfiles.service";
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# https://github.com/NixOS/nixpkgs/pull/81371#issuecomment-605526099
|
2024-08-29 22:46:31 +00:00
|
|
|
wantedBy = lib.optionals (!config.boot.isContainer) [ "multi-user.target" ];
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2021-03-01 10:53:24 +00:00
|
|
|
path = with pkgs; [ lego coreutils diffutils openssl ];
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
serviceConfig = commonServiceConfig // {
|
|
|
|
Group = data.group;
|
|
|
|
|
2024-10-02 09:34:50 +00:00
|
|
|
# Let's Encrypt Failed Validation Limit allows 5 retries per hour, per account, hostname and hour.
|
2023-11-14 19:29:50 +00:00
|
|
|
# This avoids eating them all up if something is misconfigured upon the first try.
|
|
|
|
RestartSec = 15 * 60;
|
|
|
|
|
2020-12-13 22:19:53 +00:00
|
|
|
# Keep in mind that these directories will be deleted if the user runs
|
|
|
|
# systemctl clean --what=state
|
|
|
|
# acme/.lego/${cert} is listed for this reason.
|
2020-12-29 15:01:08 +00:00
|
|
|
StateDirectory = [
|
|
|
|
"acme/${cert}"
|
|
|
|
"acme/.lego/${cert}"
|
|
|
|
"acme/.lego/${cert}/${certDir}"
|
|
|
|
"acme/.lego/accounts/${accountHash}"
|
|
|
|
];
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2021-09-24 11:09:37 +00:00
|
|
|
ReadWritePaths = commonServiceConfig.ReadWritePaths ++ webroots;
|
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
# Needs to be space separated, but can't use a multiline string because that'll include newlines
|
2020-12-29 15:01:08 +00:00
|
|
|
BindPaths = [
|
|
|
|
"${accountDir}:/tmp/accounts"
|
|
|
|
"/var/lib/acme/${cert}:/tmp/out"
|
|
|
|
"/var/lib/acme/.lego/${cert}/${certDir}:/tmp/certificates"
|
|
|
|
];
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
EnvironmentFile = lib.mkIf useDnsOrS3 data.environmentFile;
|
2020-09-04 17:48:47 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
Environment = lib.mkIf useDnsOrS3
|
|
|
|
(lib.mapAttrsToList (k: v: ''"${k}=%d/${k}"'') data.credentialFiles);
|
2023-07-20 10:44:11 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
LoadCredential = lib.mkIf useDnsOrS3
|
|
|
|
(lib.mapAttrsToList (k: v: "${k}:${v}") data.credentialFiles);
|
2023-07-20 10:44:11 +00:00
|
|
|
|
2020-09-04 17:48:47 +00:00
|
|
|
# Run as root (Prefixed with +)
|
|
|
|
ExecStartPost = "+" + (pkgs.writeShellScript "acme-postrun" ''
|
2024-08-29 22:46:31 +00:00
|
|
|
cd /var/lib/acme/${lib.escapeShellArg cert}
|
2020-09-04 17:48:47 +00:00
|
|
|
if [ -e renewed ]; then
|
|
|
|
rm renewed
|
|
|
|
${data.postRun}
|
2024-08-29 22:46:31 +00:00
|
|
|
${lib.optionalString (data.reloadServices != [])
|
|
|
|
"systemctl --no-block try-reload-or-restart ${lib.escapeShellArgs data.reloadServices}"
|
2021-10-06 09:53:04 +00:00
|
|
|
}
|
2020-09-04 17:48:47 +00:00
|
|
|
fi
|
|
|
|
'');
|
2024-08-29 22:46:31 +00:00
|
|
|
} // lib.optionalAttrs (data.listenHTTP != null && lib.toInt (lib.last (lib.splitString ":" data.listenHTTP)) < 1024) {
|
2021-12-18 14:52:32 +00:00
|
|
|
CapabilityBoundingSet = [ "CAP_NET_BIND_SERVICE" ];
|
2022-09-18 20:27:11 +00:00
|
|
|
AmbientCapabilities = [ "CAP_NET_BIND_SERVICE" ];
|
2021-01-09 19:34:54 +00:00
|
|
|
};
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# Working directory will be /tmp
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
script = (if (lockfileName == null) then lib.id else wrapInFlock "${lockdir}${lockfileName}") ''
|
2024-08-29 22:46:31 +00:00
|
|
|
${lib.optionalString data.enableDebugLogs "set -x"}
|
2021-11-26 12:58:40 +00:00
|
|
|
set -euo pipefail
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2021-03-01 10:53:24 +00:00
|
|
|
# This reimplements the expiration date check, but without querying
|
|
|
|
# the acme server first. By doing this offline, we avoid errors
|
|
|
|
# when the network or DNS are unavailable, which can happen during
|
|
|
|
# nixos-rebuild switch.
|
|
|
|
is_expiration_skippable() {
|
|
|
|
pem=$1
|
|
|
|
|
|
|
|
# This function relies on set -e to exit early if any of the
|
|
|
|
# conditions or programs fail.
|
|
|
|
|
|
|
|
[[ -e $pem ]]
|
|
|
|
|
|
|
|
expiration_line="$(
|
|
|
|
set -euxo pipefail
|
2024-05-13 22:01:18 +00:00
|
|
|
openssl x509 -noout -enddate <"$pem" \
|
2021-03-01 10:53:24 +00:00
|
|
|
| grep notAfter \
|
|
|
|
| sed -e 's/^notAfter=//'
|
|
|
|
)"
|
|
|
|
[[ -n "$expiration_line" ]]
|
|
|
|
|
|
|
|
expiration_date="$(date -d "$expiration_line" +%s)"
|
|
|
|
now="$(date +%s)"
|
2024-05-13 22:01:18 +00:00
|
|
|
expiration_s=$((expiration_date - now))
|
|
|
|
expiration_days=$((expiration_s / (3600 * 24))) # rounds down
|
2021-03-01 10:53:24 +00:00
|
|
|
|
2021-11-28 17:03:31 +00:00
|
|
|
[[ $expiration_days -gt ${toString data.validMinDays} ]]
|
2021-03-01 10:53:24 +00:00
|
|
|
}
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
${lib.optionalString (data.webroot != null) ''
|
2021-03-15 01:33:45 +00:00
|
|
|
# Ensure the webroot exists. Fixing group is required in case configuration was changed between runs.
|
|
|
|
# Lego will fail if the webroot does not exist at all.
|
|
|
|
(
|
|
|
|
mkdir -p '${data.webroot}/.well-known/acme-challenge' \
|
|
|
|
&& chgrp '${data.group}' ${data.webroot}/.well-known/acme-challenge
|
|
|
|
) || (
|
|
|
|
echo 'Please ensure ${data.webroot}/.well-known/acme-challenge exists and is writable by acme:${data.group}' \
|
|
|
|
&& exit 1
|
|
|
|
)
|
2021-01-09 19:34:54 +00:00
|
|
|
''}
|
|
|
|
|
2020-09-04 22:39:22 +00:00
|
|
|
echo '${domainHash}' > domainhash.txt
|
|
|
|
|
2021-11-27 00:03:35 +00:00
|
|
|
# Check if we can renew.
|
|
|
|
# We can only renew if the list of domains has not changed.
|
2022-09-19 00:07:29 +00:00
|
|
|
# We also need an account key. Avoids #190493
|
2024-05-13 22:01:18 +00:00
|
|
|
if cmp -s domainhash.txt certificates/domainhash.txt && [ -e 'certificates/${keyName}.key' ] && [ -e 'certificates/${keyName}.crt' ] && [ -n "$(find accounts -name '${data.email}.key')" ]; then
|
2020-09-04 22:39:22 +00:00
|
|
|
|
2021-11-27 00:03:35 +00:00
|
|
|
# Even if a cert is not expired, it may be revoked by the CA.
|
|
|
|
# Try to renew, and silently fail if the cert is not expired.
|
|
|
|
# Avoids #85794 and resolves #129838
|
2021-11-28 17:03:31 +00:00
|
|
|
if ! lego ${renewOpts} --days ${toString data.validMinDays}; then
|
2021-03-01 10:53:24 +00:00
|
|
|
if is_expiration_skippable out/full.pem; then
|
2021-11-28 17:03:31 +00:00
|
|
|
echo 1>&2 "nixos-acme: Ignoring failed renewal because expiration isn't within the coming ${toString data.validMinDays} days"
|
2021-03-01 10:53:24 +00:00
|
|
|
else
|
2021-11-28 17:03:31 +00:00
|
|
|
# High number to avoid Systemd reserved codes.
|
|
|
|
exit 11
|
2021-03-01 10:53:24 +00:00
|
|
|
fi
|
2020-09-04 22:39:22 +00:00
|
|
|
fi
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# Otherwise do a full run
|
2021-11-26 21:57:31 +00:00
|
|
|
elif ! lego ${runOpts}; then
|
|
|
|
# Produce a nice error for those doing their first nixos-rebuild with these certs
|
|
|
|
echo Failed to fetch certificates. \
|
|
|
|
This may mean your DNS records are set up incorrectly. \
|
2024-08-29 22:46:31 +00:00
|
|
|
${lib.optionalString (cfg.preliminarySelfsigned) "Selfsigned certs are in place and dependant services will still start."}
|
2021-11-28 17:03:31 +00:00
|
|
|
# Exit 10 so that users can potentially amend SuccessExitStatus to ignore this error.
|
|
|
|
# High number to avoid Systemd reserved codes.
|
|
|
|
exit 10
|
2020-06-19 19:27:46 +00:00
|
|
|
fi
|
|
|
|
|
2020-09-04 22:39:22 +00:00
|
|
|
mv domainhash.txt certificates/
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# Group might change between runs, re-apply it
|
2021-12-04 19:01:18 +00:00
|
|
|
chown '${user}:${data.group}' certificates/*
|
2020-06-19 19:27:46 +00:00
|
|
|
|
|
|
|
# Copy all certs to the "real" certs directory
|
2021-11-27 00:03:35 +00:00
|
|
|
if ! cmp -s 'certificates/${keyName}.crt' out/fullchain.pem; then
|
2020-09-04 17:48:47 +00:00
|
|
|
touch out/renewed
|
2020-09-03 14:31:06 +00:00
|
|
|
echo Installing new certificate
|
|
|
|
cp -vp 'certificates/${keyName}.crt' out/fullchain.pem
|
|
|
|
cp -vp 'certificates/${keyName}.key' out/key.pem
|
|
|
|
cp -vp 'certificates/${keyName}.issuer.crt' out/chain.pem
|
2020-06-19 19:27:46 +00:00
|
|
|
ln -sf fullchain.pem out/cert.pem
|
|
|
|
cat out/key.pem out/fullchain.pem > out/full.pem
|
|
|
|
fi
|
2021-05-04 23:27:19 +00:00
|
|
|
|
|
|
|
# By default group will have no access to the cert files.
|
|
|
|
# This chmod will fix that.
|
|
|
|
chmod 640 out/*
|
2020-06-19 19:27:46 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
certConfigs = lib.mapAttrs certToConfig cfg.certs;
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2021-11-28 17:03:31 +00:00
|
|
|
# These options can be specified within
|
2021-12-09 21:43:54 +00:00
|
|
|
# security.acme.defaults or security.acme.certs.<name>
|
|
|
|
inheritableModule = isDefaults: { config, ... }: let
|
|
|
|
defaultAndText = name: default: {
|
|
|
|
# When ! isDefaults then this is the option declaration for the
|
|
|
|
# security.acme.certs.<name> path, which has the extra inheritDefaults
|
|
|
|
# option, which if disabled means that we can't inherit it
|
|
|
|
default = if isDefaults || ! config.inheritDefaults then default else cfg.defaults.${name};
|
|
|
|
# The docs however don't need to depend on inheritDefaults, they should
|
|
|
|
# stay constant. Though notably it wouldn't matter much, because to get
|
|
|
|
# the option information, a submodule with name `<name>` is evaluated
|
|
|
|
# without any definitions.
|
2024-08-29 22:46:31 +00:00
|
|
|
defaultText = if isDefaults then default else lib.literalExpression "config.security.acme.defaults.${name}";
|
2021-12-09 21:43:54 +00:00
|
|
|
};
|
|
|
|
in {
|
2023-07-21 14:01:48 +00:00
|
|
|
imports = [
|
2024-08-29 22:46:31 +00:00
|
|
|
(lib.mkRenamedOptionModule [ "credentialsFile" ] [ "environmentFile" ])
|
2023-07-21 14:01:48 +00:00
|
|
|
];
|
|
|
|
|
2021-12-09 21:43:54 +00:00
|
|
|
options = {
|
2024-08-29 22:46:31 +00:00
|
|
|
validMinDays = lib.mkOption {
|
|
|
|
type = lib.types.int;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "validMinDays" 30) default defaultText;
|
2021-11-28 17:03:31 +00:00
|
|
|
description = "Minimum remaining validity before renewal in days.";
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
renewInterval = lib.mkOption {
|
|
|
|
type = lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "renewInterval" "daily") default defaultText;
|
2021-11-28 17:03:31 +00:00
|
|
|
description = ''
|
|
|
|
Systemd calendar expression when to check for renewal. See
|
|
|
|
{manpage}`systemd.time(7)`.
|
|
|
|
'';
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
enableDebugLogs = lib.mkEnableOption "debug logging for this certificate" // {
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "enableDebugLogs" true) default defaultText;
|
2020-06-19 19:27:46 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
webroot = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "webroot" null) default defaultText;
|
2021-01-08 23:33:40 +00:00
|
|
|
example = "/var/lib/acme/acme-challenge";
|
2015-12-08 18:09:19 +00:00
|
|
|
description = ''
|
|
|
|
Where the webroot of the HTTP vhost is located.
|
|
|
|
{file}`.well-known/acme-challenge/` directory
|
2017-06-08 06:46:40 +00:00
|
|
|
will be created below the webroot if it doesn't exist.
|
2015-12-08 18:09:19 +00:00
|
|
|
`http://example.org/.well-known/acme-challenge/` must also
|
|
|
|
be available (notice unencrypted HTTP).
|
|
|
|
'';
|
2015-12-06 15:55:09 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
server = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2024-02-06 00:51:09 +00:00
|
|
|
inherit (defaultAndText "server" "https://acme-v02.api.letsencrypt.org/directory") default defaultText;
|
|
|
|
example = "https://acme-staging-v02.api.letsencrypt.org/directory";
|
2019-10-25 22:40:51 +00:00
|
|
|
description = ''
|
2024-02-06 00:51:09 +00:00
|
|
|
ACME Directory Resource URI.
|
|
|
|
Defaults to Let's Encrypt's production endpoint.
|
|
|
|
For testing Let's Encrypt's [staging endpoint](https://letsencrypt.org/docs/staging-environment/)
|
|
|
|
should be used to avoid the rather tight rate limit on the production endpoint.
|
2019-10-25 22:40:51 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
email = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "email" null) default defaultText;
|
2021-11-28 17:03:31 +00:00
|
|
|
description = ''
|
|
|
|
Email address for account creation and correspondence from the CA.
|
|
|
|
It is recommended to use the same email for all certs to avoid account
|
|
|
|
creation limits.
|
|
|
|
'';
|
2015-12-06 15:55:09 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
group = lib.mkOption {
|
|
|
|
type = lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "group" "acme") default defaultText;
|
2015-12-11 16:42:17 +00:00
|
|
|
description = "Group running the ACME client.";
|
2015-12-08 18:09:19 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
reloadServices = lib.mkOption {
|
|
|
|
type = lib.types.listOf lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "reloadServices" []) default defaultText;
|
2021-10-06 09:53:04 +00:00
|
|
|
description = ''
|
|
|
|
The list of systemd services to call `systemctl try-reload-or-restart`
|
|
|
|
on.
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
postRun = lib.mkOption {
|
|
|
|
type = lib.types.lines;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "postRun" "") default defaultText;
|
2020-06-19 19:27:46 +00:00
|
|
|
example = "cp full.pem backup.pem";
|
2015-12-08 18:09:19 +00:00
|
|
|
description = ''
|
2020-06-19 19:27:46 +00:00
|
|
|
Commands to run after new certificates go live. Note that
|
2020-09-04 17:48:47 +00:00
|
|
|
these commands run as the root user.
|
2017-11-19 15:41:28 +00:00
|
|
|
|
|
|
|
Executed in the same directory with the new certificate.
|
2015-12-08 18:09:19 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
keyType = lib.mkOption {
|
|
|
|
type = lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "keyType" "ec256") default defaultText;
|
2020-01-19 18:24:04 +00:00
|
|
|
description = ''
|
|
|
|
Key type to use for private keys.
|
|
|
|
For an up to date list of supported values check the --key-type option
|
2023-10-22 03:31:04 +00:00
|
|
|
at <https://go-acme.github.io/lego/usage/cli/options/>.
|
2020-01-19 18:24:04 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
dnsProvider = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "dnsProvider" null) default defaultText;
|
2020-01-15 09:17:11 +00:00
|
|
|
example = "route53";
|
2020-01-19 18:24:04 +00:00
|
|
|
description = ''
|
|
|
|
DNS Challenge provider. For a list of supported providers, see the "code"
|
2020-04-21 13:39:46 +00:00
|
|
|
field of the DNS providers listed at <https://go-acme.github.io/lego/dns/>.
|
2020-01-19 18:24:04 +00:00
|
|
|
'';
|
2020-01-12 21:05:57 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
dnsResolver = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "dnsResolver" null) default defaultText;
|
2020-10-07 11:01:08 +00:00
|
|
|
example = "1.1.1.1:53";
|
|
|
|
description = ''
|
|
|
|
Set the resolver to use for performing recursive DNS queries. Supported:
|
|
|
|
host:port. The default is to use the system resolvers, or Google's DNS
|
|
|
|
resolvers if the system's cannot be determined.
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
environmentFile = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.path;
|
2023-07-21 14:01:48 +00:00
|
|
|
inherit (defaultAndText "environmentFile" null) default defaultText;
|
2020-01-12 21:05:57 +00:00
|
|
|
description = ''
|
2020-01-19 18:24:04 +00:00
|
|
|
Path to an EnvironmentFile for the cert's service containing any required and
|
|
|
|
optional environment variables for your selected dnsProvider.
|
|
|
|
To find out what values you need to set, consult the documentation at
|
2020-04-21 13:39:46 +00:00
|
|
|
<https://go-acme.github.io/lego/dns/> for the corresponding dnsProvider.
|
2020-01-12 21:05:57 +00:00
|
|
|
'';
|
|
|
|
example = "/var/src/secrets/example.org-route53-api-token";
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
credentialFiles = lib.mkOption {
|
|
|
|
type = lib.types.attrsOf (lib.types.path);
|
2023-07-20 10:44:11 +00:00
|
|
|
inherit (defaultAndText "credentialFiles" {}) default defaultText;
|
|
|
|
description = ''
|
|
|
|
Environment variables suffixed by "_FILE" to set for the cert's service
|
|
|
|
for your selected dnsProvider.
|
|
|
|
To find out what values you need to set, consult the documentation at
|
|
|
|
<https://go-acme.github.io/lego/dns/> for the corresponding dnsProvider.
|
|
|
|
This allows to securely pass credential files to lego by leveraging systemd
|
|
|
|
credentials.
|
|
|
|
'';
|
2024-08-29 22:46:31 +00:00
|
|
|
example = lib.literalExpression ''
|
2023-07-20 10:44:11 +00:00
|
|
|
{
|
|
|
|
"RFC2136_TSIG_SECRET_FILE" = "/run/secrets/tsig-secret-example.org";
|
|
|
|
}
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
dnsPropagationCheck = lib.mkOption {
|
|
|
|
type = lib.types.bool;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "dnsPropagationCheck" true) default defaultText;
|
2020-01-12 21:05:57 +00:00
|
|
|
description = ''
|
2020-01-19 18:24:04 +00:00
|
|
|
Toggles lego DNS propagation check, which is used alongside DNS-01
|
|
|
|
challenge to ensure the DNS entries required are available.
|
2020-01-12 21:05:57 +00:00
|
|
|
'';
|
|
|
|
};
|
2020-02-23 01:51:19 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
ocspMustStaple = lib.mkOption {
|
|
|
|
type = lib.types.bool;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "ocspMustStaple" false) default defaultText;
|
2020-02-23 01:51:19 +00:00
|
|
|
description = ''
|
|
|
|
Turns on the OCSP Must-Staple TLS extension.
|
|
|
|
Make sure you know what you're doing! See:
|
2022-08-29 21:34:22 +00:00
|
|
|
|
2020-02-23 01:51:19 +00:00
|
|
|
- <https://blog.apnic.net/2019/01/15/is-the-web-ready-for-ocsp-must-staple/>
|
|
|
|
- <https://blog.hboeck.de/archives/886-The-Problem-with-OCSP-Stapling-and-Must-Staple-and-why-Certificate-Revocation-is-still-broken.html>
|
|
|
|
'';
|
|
|
|
};
|
2020-02-23 02:02:44 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
extraLegoFlags = lib.mkOption {
|
|
|
|
type = lib.types.listOf lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "extraLegoFlags" []) default defaultText;
|
2020-06-08 00:17:55 +00:00
|
|
|
description = ''
|
|
|
|
Additional global flags to pass to all lego commands.
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
extraLegoRenewFlags = lib.mkOption {
|
|
|
|
type = lib.types.listOf lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "extraLegoRenewFlags" []) default defaultText;
|
2020-02-23 02:02:44 +00:00
|
|
|
description = ''
|
|
|
|
Additional flags to pass to lego renew.
|
|
|
|
'';
|
|
|
|
};
|
2020-06-08 00:18:31 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
extraLegoRunFlags = lib.mkOption {
|
|
|
|
type = lib.types.listOf lib.types.str;
|
2021-12-09 21:43:54 +00:00
|
|
|
inherit (defaultAndText "extraLegoRunFlags" []) default defaultText;
|
2020-06-08 00:18:31 +00:00
|
|
|
description = ''
|
|
|
|
Additional flags to pass to lego run.
|
|
|
|
'';
|
|
|
|
};
|
2015-12-06 15:55:09 +00:00
|
|
|
};
|
2021-12-09 21:43:54 +00:00
|
|
|
};
|
2019-10-25 22:40:51 +00:00
|
|
|
|
2021-12-04 18:09:43 +00:00
|
|
|
certOpts = { name, config, ... }: {
|
2021-12-09 21:43:54 +00:00
|
|
|
options = {
|
2021-11-28 17:03:31 +00:00
|
|
|
# user option has been removed
|
2024-08-29 22:46:31 +00:00
|
|
|
user = lib.mkOption {
|
2021-11-28 17:03:31 +00:00
|
|
|
visible = false;
|
|
|
|
default = "_mkRemovedOptionModule";
|
|
|
|
};
|
2015-12-06 15:55:09 +00:00
|
|
|
|
2021-11-28 17:03:31 +00:00
|
|
|
# allowKeysForGroup option has been removed
|
2024-08-29 22:46:31 +00:00
|
|
|
allowKeysForGroup = lib.mkOption {
|
2021-11-28 17:03:31 +00:00
|
|
|
visible = false;
|
|
|
|
default = "_mkRemovedOptionModule";
|
|
|
|
};
|
2021-11-26 12:58:40 +00:00
|
|
|
|
2021-11-28 17:03:31 +00:00
|
|
|
# extraDomains was replaced with extraDomainNames
|
2024-08-29 22:46:31 +00:00
|
|
|
extraDomains = lib.mkOption {
|
2021-11-28 17:03:31 +00:00
|
|
|
visible = false;
|
|
|
|
default = "_mkMergedOptionModule";
|
2020-01-12 21:05:57 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
directory = lib.mkOption {
|
|
|
|
type = lib.types.str;
|
2021-11-28 17:03:31 +00:00
|
|
|
readOnly = true;
|
|
|
|
default = "/var/lib/acme/${name}";
|
|
|
|
description = "Directory where certificate and other state is stored.";
|
2015-12-12 13:21:44 +00:00
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
domain = lib.mkOption {
|
|
|
|
type = lib.types.str;
|
2021-11-28 17:03:31 +00:00
|
|
|
default = name;
|
|
|
|
description = "Domain to fetch certificate for (defaults to the entry name).";
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
extraDomainNames = lib.mkOption {
|
|
|
|
type = lib.types.listOf lib.types.str;
|
2021-11-28 17:03:31 +00:00
|
|
|
default = [];
|
2024-08-29 22:46:31 +00:00
|
|
|
example = lib.literalExpression ''
|
2021-11-28 17:03:31 +00:00
|
|
|
[
|
|
|
|
"example.org"
|
|
|
|
"mydomain.org"
|
|
|
|
]
|
|
|
|
'';
|
2015-12-12 13:21:44 +00:00
|
|
|
description = ''
|
2021-11-28 17:03:31 +00:00
|
|
|
A list of extra domain names, which are included in the one certificate to be issued.
|
2015-12-12 13:21:44 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2021-11-28 17:03:31 +00:00
|
|
|
# This setting must be different for each configured certificate, otherwise
|
|
|
|
# two or more renewals may fail to bind to the address. Hence, it is not in
|
|
|
|
# the inheritableOpts.
|
2024-08-29 22:46:31 +00:00
|
|
|
listenHTTP = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2019-10-25 22:40:51 +00:00
|
|
|
default = null;
|
2021-11-28 17:03:31 +00:00
|
|
|
example = ":1360";
|
2019-10-25 22:40:51 +00:00
|
|
|
description = ''
|
2021-11-28 17:03:31 +00:00
|
|
|
Interface and port to listen on to solve HTTP challenges
|
|
|
|
in the form [INTERFACE]:PORT.
|
|
|
|
If you use a port other than 80, you must proxy port 80 to this port.
|
2019-10-25 22:40:51 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
s3Bucket = lib.mkOption {
|
|
|
|
type = lib.types.nullOr lib.types.str;
|
2023-10-25 19:08:05 +00:00
|
|
|
default = null;
|
|
|
|
example = "acme";
|
|
|
|
description = ''
|
|
|
|
S3 bucket name to use for HTTP-01 based challenges. Challenges will be written to the S3 bucket.
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
inheritDefaults = lib.mkOption {
|
2021-11-28 17:03:31 +00:00
|
|
|
default = true;
|
|
|
|
example = true;
|
|
|
|
description = "Whether to inherit values set in `security.acme.defaults` or not.";
|
|
|
|
type = lib.types.bool;
|
|
|
|
};
|
|
|
|
};
|
|
|
|
};
|
|
|
|
|
|
|
|
in {
|
|
|
|
|
|
|
|
options = {
|
|
|
|
security.acme = {
|
2024-08-29 22:46:31 +00:00
|
|
|
preliminarySelfsigned = lib.mkOption {
|
|
|
|
type = lib.types.bool;
|
2016-06-01 10:39:46 +00:00
|
|
|
default = true;
|
|
|
|
description = ''
|
|
|
|
Whether a preliminary self-signed certificate should be generated before
|
|
|
|
doing ACME requests. This can be useful when certificates are required in
|
|
|
|
a webserver, but ACME needs the webserver to make its requests.
|
|
|
|
|
|
|
|
With preliminary self-signed certificate the webserver can be started and
|
|
|
|
can later reload the correct ACME certificates.
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
acceptTerms = lib.mkOption {
|
|
|
|
type = lib.types.bool;
|
2020-01-19 18:24:04 +00:00
|
|
|
default = false;
|
2020-01-12 21:05:57 +00:00
|
|
|
description = ''
|
2020-04-21 13:39:46 +00:00
|
|
|
Accept the CA's terms of service. The default provider is Let's Encrypt,
|
|
|
|
you can find their ToS at <https://letsencrypt.org/repository/>.
|
2020-01-12 21:05:57 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
useRoot = lib.mkOption {
|
|
|
|
type = lib.types.bool;
|
2021-12-04 19:01:18 +00:00
|
|
|
default = false;
|
|
|
|
description = ''
|
|
|
|
Whether to use the root user when generating certs. This is not recommended
|
2022-12-18 00:31:14 +00:00
|
|
|
for security + compatibility reasons. If a service requires root owned certificates
|
2021-12-04 19:01:18 +00:00
|
|
|
consider following the guide on "Using ACME with services demanding root
|
|
|
|
owned certificates" in the NixOS manual, and only using this as a fallback
|
|
|
|
or for testing.
|
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
defaults = lib.mkOption {
|
|
|
|
type = lib.types.submodule (inheritableModule true);
|
2021-11-28 17:03:31 +00:00
|
|
|
description = ''
|
|
|
|
Default values inheritable by all configured certs. You can
|
|
|
|
use this to define options shared by all your certs. These defaults
|
|
|
|
can also be ignored on a per-cert basis using the
|
2023-01-21 10:06:46 +00:00
|
|
|
{option}`security.acme.certs.''${cert}.inheritDefaults` option.
|
2021-11-28 17:03:31 +00:00
|
|
|
'';
|
|
|
|
};
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
certs = lib.mkOption {
|
2015-12-06 15:55:09 +00:00
|
|
|
default = { };
|
2024-08-29 22:46:31 +00:00
|
|
|
type = with lib.types; attrsOf (submodule [ (inheritableModule false) certOpts ]);
|
2015-12-06 15:55:09 +00:00
|
|
|
description = ''
|
2019-08-29 14:32:59 +00:00
|
|
|
Attribute set of certificates to get signed and renewed. Creates
|
|
|
|
`acme-''${cert}.{service,timer}` systemd units for
|
|
|
|
each certificate defined here. Other services can add dependencies
|
|
|
|
to those units if they rely on the certificates being present,
|
|
|
|
or trigger restarts of the service if certificates get renewed.
|
2015-12-06 15:55:09 +00:00
|
|
|
'';
|
2024-08-29 22:46:31 +00:00
|
|
|
example = lib.literalExpression ''
|
2017-06-08 06:46:40 +00:00
|
|
|
{
|
|
|
|
"example.com" = {
|
2021-01-08 23:33:40 +00:00
|
|
|
webroot = "/var/lib/acme/acme-challenge/";
|
2017-06-08 06:46:40 +00:00
|
|
|
email = "foo@example.com";
|
2020-06-19 19:27:46 +00:00
|
|
|
extraDomainNames = [ "www.example.com" "foo.example.com" ];
|
2017-06-08 06:46:40 +00:00
|
|
|
};
|
|
|
|
"bar.example.com" = {
|
2021-01-08 23:33:40 +00:00
|
|
|
webroot = "/var/lib/acme/acme-challenge/";
|
2017-06-08 06:46:40 +00:00
|
|
|
email = "bar@example.com";
|
|
|
|
};
|
|
|
|
}
|
|
|
|
'';
|
2015-12-06 15:55:09 +00:00
|
|
|
};
|
2024-08-29 22:46:31 +00:00
|
|
|
maxConcurrentRenewals = lib.mkOption {
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
default = 5;
|
2024-08-29 22:46:31 +00:00
|
|
|
type = lib.types.int;
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
description = ''
|
|
|
|
Maximum number of concurrent certificate generation or renewal jobs. All other
|
|
|
|
jobs will queue and wait running jobs to finish. Reduces the system load of
|
|
|
|
certificate generation.
|
|
|
|
|
|
|
|
Set to `0` to allow unlimited number of concurrent job runs."
|
|
|
|
'';
|
|
|
|
};
|
2015-12-06 15:55:09 +00:00
|
|
|
};
|
|
|
|
};
|
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
imports = [
|
2024-08-29 22:46:31 +00:00
|
|
|
(lib.mkRemovedOptionModule [ "security" "acme" "production" ] ''
|
2020-06-19 19:27:46 +00:00
|
|
|
Use security.acme.server to define your staging ACME server URL instead.
|
|
|
|
|
|
|
|
To use the let's encrypt staging server, use security.acme.server =
|
|
|
|
"https://acme-staging-v02.api.letsencrypt.org/directory".
|
2021-11-28 17:03:31 +00:00
|
|
|
'')
|
2024-08-29 22:46:31 +00:00
|
|
|
(lib.mkRemovedOptionModule [ "security" "acme" "directory" ] "ACME Directory is now hardcoded to /var/lib/acme and its permissions are managed by systemd. See https://github.com/NixOS/nixpkgs/issues/53852 for more info.")
|
|
|
|
(lib.mkRemovedOptionModule [ "security" "acme" "preDelay" ] "This option has been removed. If you want to make sure that something executes before certificates are provisioned, add a RequiredBy=acme-\${cert}.service to the service you want to execute before the cert renewal")
|
|
|
|
(lib.mkRemovedOptionModule [ "security" "acme" "activationDelay" ] "This option has been removed. If you want to make sure that something executes before certificates are provisioned, add a RequiredBy=acme-\${cert}.service to the service you want to execute before the cert renewal")
|
|
|
|
(lib.mkChangedOptionModule [ "security" "acme" "validMin" ] [ "security" "acme" "defaults" "validMinDays" ] (config: config.security.acme.validMin / (24 * 3600)))
|
|
|
|
(lib.mkChangedOptionModule [ "security" "acme" "validMinDays" ] [ "security" "acme" "defaults" "validMinDays" ] (config: config.security.acme.validMinDays))
|
|
|
|
(lib.mkChangedOptionModule [ "security" "acme" "renewInterval" ] [ "security" "acme" "defaults" "renewInterval" ] (config: config.security.acme.renewInterval))
|
|
|
|
(lib.mkChangedOptionModule [ "security" "acme" "email" ] [ "security" "acme" "defaults" "email" ] (config: config.security.acme.email))
|
|
|
|
(lib.mkChangedOptionModule [ "security" "acme" "server" ] [ "security" "acme" "defaults" "server" ] (config: config.security.acme.server))
|
|
|
|
(lib.mkChangedOptionModule [ "security" "acme" "enableDebugLogs" ] [ "security" "acme" "defaults" "enableDebugLogs" ] (config: config.security.acme.enableDebugLogs))
|
2020-06-19 19:27:46 +00:00
|
|
|
];
|
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
config = lib.mkMerge [
|
|
|
|
(lib.mkIf (cfg.certs != { }) {
|
2015-12-12 15:06:24 +00:00
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
# FIXME Most of these custom warnings and filters for security.acme.certs.* are required
|
|
|
|
# because using mkRemovedOptionModule/mkChangedOptionModule with attrsets isn't possible.
|
2024-08-29 22:46:31 +00:00
|
|
|
warnings = lib.filter (w: w != "") (lib.mapAttrsToList (cert: data: lib.optionalString (data.extraDomains != "_mkMergedOptionModule") ''
|
2020-06-19 19:27:46 +00:00
|
|
|
The option definition `security.acme.certs.${cert}.extraDomains` has changed
|
|
|
|
to `security.acme.certs.${cert}.extraDomainNames` and is now a list of strings.
|
|
|
|
Setting a custom webroot for extra domains is not possible, instead use separate certs.
|
2023-03-19 20:44:31 +00:00
|
|
|
'') cfg.certs);
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2020-01-12 21:05:57 +00:00
|
|
|
assertions = let
|
2024-08-29 22:46:31 +00:00
|
|
|
certs = lib.attrValues cfg.certs;
|
2020-01-12 21:05:57 +00:00
|
|
|
in [
|
|
|
|
{
|
2024-08-29 22:46:31 +00:00
|
|
|
assertion = cfg.defaults.email != null || lib.all (certOpts: certOpts.email != null) certs;
|
2020-01-12 21:05:57 +00:00
|
|
|
message = ''
|
|
|
|
You must define `security.acme.certs.<name>.email` or
|
2024-01-19 21:28:56 +00:00
|
|
|
`security.acme.defaults.email` to register with the CA. Note that using
|
2020-06-19 19:27:46 +00:00
|
|
|
many different addresses for certs may trigger account rate limits.
|
2020-01-12 21:05:57 +00:00
|
|
|
'';
|
|
|
|
}
|
2020-01-19 18:24:04 +00:00
|
|
|
{
|
|
|
|
assertion = cfg.acceptTerms;
|
|
|
|
message = ''
|
|
|
|
You must accept the CA's terms of service before using
|
|
|
|
the ACME module by setting `security.acme.acceptTerms`
|
|
|
|
to `true`. For Let's Encrypt's ToS see https://letsencrypt.org/repository/
|
|
|
|
'';
|
|
|
|
}
|
2024-08-29 22:46:31 +00:00
|
|
|
] ++ (builtins.concatLists (lib.mapAttrsToList (cert: data: [
|
2020-06-19 19:27:46 +00:00
|
|
|
{
|
|
|
|
assertion = data.user == "_mkRemovedOptionModule";
|
|
|
|
message = ''
|
|
|
|
The option definition `security.acme.certs.${cert}.user' no longer has any effect; Please remove it.
|
|
|
|
Certificate user is now hard coded to the "acme" user. If you would
|
|
|
|
like another user to have access, consider adding them to the
|
|
|
|
"acme" group or changing security.acme.certs.${cert}.group.
|
|
|
|
'';
|
|
|
|
}
|
|
|
|
{
|
|
|
|
assertion = data.allowKeysForGroup == "_mkRemovedOptionModule";
|
|
|
|
message = ''
|
|
|
|
The option definition `security.acme.certs.${cert}.allowKeysForGroup' no longer has any effect; Please remove it.
|
|
|
|
All certs are readable by the configured group. If this is undesired,
|
|
|
|
consider changing security.acme.certs.${cert}.group to an unused group.
|
|
|
|
'';
|
|
|
|
}
|
|
|
|
# * in the cert value breaks building of systemd services, and makes
|
|
|
|
# referencing them as a user quite weird too. Best practice is to use
|
|
|
|
# the domain option.
|
|
|
|
{
|
2024-08-29 22:46:31 +00:00
|
|
|
assertion = ! lib.hasInfix "*" cert;
|
2020-06-19 19:27:46 +00:00
|
|
|
message = ''
|
|
|
|
The cert option path `security.acme.certs.${cert}.dnsProvider`
|
|
|
|
cannot contain a * character.
|
|
|
|
Instead, set `security.acme.certs.${cert}.domain = "${cert}";`
|
|
|
|
and remove the wildcard from the path.
|
|
|
|
'';
|
|
|
|
}
|
2023-10-26 09:28:43 +00:00
|
|
|
(let exclusiveAttrs = {
|
|
|
|
inherit (data) dnsProvider webroot listenHTTP s3Bucket;
|
|
|
|
}; in {
|
|
|
|
assertion = lib.length (lib.filter (x: x != null) (builtins.attrValues exclusiveAttrs)) == 1;
|
2021-05-21 08:07:24 +00:00
|
|
|
message = ''
|
2023-10-25 19:08:05 +00:00
|
|
|
Exactly one of the options
|
|
|
|
`security.acme.certs.${cert}.dnsProvider`,
|
|
|
|
`security.acme.certs.${cert}.webroot`,
|
|
|
|
`security.acme.certs.${cert}.listenHTTP` and
|
|
|
|
`security.acme.certs.${cert}.s3Bucket`
|
|
|
|
is required.
|
2023-10-26 09:28:43 +00:00
|
|
|
Current values: ${(lib.generators.toPretty {} exclusiveAttrs)}.
|
2021-05-21 08:07:24 +00:00
|
|
|
'';
|
2023-10-26 09:28:43 +00:00
|
|
|
})
|
2023-07-20 10:44:11 +00:00
|
|
|
{
|
2024-08-29 22:46:31 +00:00
|
|
|
assertion = lib.all (lib.hasSuffix "_FILE") (lib.attrNames data.credentialFiles);
|
2023-07-20 10:44:11 +00:00
|
|
|
message = ''
|
|
|
|
Option `security.acme.certs.${cert}.credentialFiles` can only be
|
|
|
|
used for variables suffixed by "_FILE".
|
|
|
|
'';
|
|
|
|
}
|
2020-06-19 19:27:46 +00:00
|
|
|
]) cfg.certs));
|
2015-12-12 15:06:24 +00:00
|
|
|
|
2020-06-19 19:27:46 +00:00
|
|
|
users.users.acme = {
|
|
|
|
home = "/var/lib/acme";
|
|
|
|
group = "acme";
|
|
|
|
isSystemUser = true;
|
|
|
|
};
|
|
|
|
|
|
|
|
users.groups.acme = {};
|
|
|
|
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
# for lock files, still use tmpfiles as they should better reside in /run
|
|
|
|
systemd.tmpfiles.rules = [
|
|
|
|
"d ${lockdir} 0700 ${user} - - -"
|
|
|
|
"Z ${lockdir} 0700 ${user} - - -"
|
|
|
|
];
|
|
|
|
|
|
|
|
systemd.services = let
|
2024-08-29 22:46:31 +00:00
|
|
|
renewServiceFunctions = lib.mapAttrs' (cert: conf: lib.nameValuePair "acme-${cert}" conf.renewService) certConfigs;
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
renewServices = if cfg.maxConcurrentRenewals > 0
|
|
|
|
then roundRobinApplyAttrs renewServiceFunctions concurrencyLockfiles
|
2024-08-29 22:46:31 +00:00
|
|
|
else lib.mapAttrs (_: f: f null) renewServiceFunctions;
|
|
|
|
selfsignServiceFunctions = lib.mapAttrs' (cert: conf: lib.nameValuePair "acme-selfsigned-${cert}" conf.selfsignService) certConfigs;
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
selfsignServices = if cfg.maxConcurrentRenewals > 0
|
|
|
|
then roundRobinApplyAttrs selfsignServiceFunctions concurrencyLockfiles
|
2024-08-29 22:46:31 +00:00
|
|
|
else lib.mapAttrs (_: f: f null) selfsignServiceFunctions;
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
in
|
|
|
|
{ "acme-fixperms" = userMigrationService; }
|
2024-08-29 22:46:31 +00:00
|
|
|
// (lib.optionalAttrs (cfg.maxConcurrentRenewals > 0) {"acme-lockfiles" = lockfilePrepareService; })
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
// renewServices
|
2024-08-29 22:46:31 +00:00
|
|
|
// (lib.optionalAttrs (cfg.preliminarySelfsigned) ({
|
2020-06-19 19:27:46 +00:00
|
|
|
"acme-selfsigned-ca" = selfsignCAService;
|
security/acme: limit concurrent certificate generations
fixes #232505
Implements the new option `security.acme.maxConcurrentRenewals` to limit
the number of certificate generation (or renewal) jobs that can run in
parallel. This avoids overloading the system resources with many
certificates or running into acme registry rate limits and network
timeouts.
Architecture considerations:
- simplicity, lightweight: Concerns have been voiced about making this
already rather complex module even more convoluted. Additionally,
locking solutions shall not significantly increase performance and
footprint of individual job runs.
To accomodate these concerns, this solution is implemented purely in
Nix, bash, and using the light-weight `flock` util. To reduce
complexity, jobs are already assigned their lockfile slot at system
build time instead of dynamic locking and retrying. This comes at the
cost of not always maxing out the permitted concurrency at runtime.
- no stale locks: Limiting concurrency via locking mechanism is usually
approached with semaphores. Unfortunately, both SysV as well as
POSIX-Semaphores are *not* released when the process currently locking
them is SIGKILLed. This poses the danger of stale locks staying around
and certificate renewal being blocked from running altogether.
`flock` locks though are released when the process holding the file
descriptor of the lock file is KILLed or terminated.
- lockfile generation: Lock files could either be created at build time
in the Nix store or at script runtime in a idempotent manner.
While the latter would be simpler to achieve, we might exceed the number
of permitted concurrent runs during a system switch: Already running
jobs are still locked on the existing lock files, while jobs started
after the system switch will acquire locks on freshly created files,
not being blocked by the still running services.
For this reason, locks are generated and managed at runtime in the
shared state directory `/var/lib/locks/`.
nixos/security/acme: move locks to /run
also, move over permission and directory management to systemd-tmpfiles
nixos/security/acme: fix some linter remarks in my code
there are some remarks left for existing code, not touching that
nixos/security/acme: redesign script locking flow
- get rid of subshell
- provide function for wrapping scripts in a locked environment
nixos/acme: improve visibility of blocking on locks
nixos/acme: add smoke test for concurrency limitation
heavily inspired by m1cr0man
nixos/acme: release notes entry on new concurrency limits
nixos/acme: cleanup, clarifications
2023-07-18 09:20:33 +00:00
|
|
|
} // selfsignServices));
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2024-08-29 22:46:31 +00:00
|
|
|
systemd.timers = lib.mapAttrs' (cert: conf: lib.nameValuePair "acme-${cert}" conf.renewTimer) certConfigs;
|
2020-06-19 19:27:46 +00:00
|
|
|
|
2020-12-13 20:22:33 +00:00
|
|
|
systemd.targets = let
|
|
|
|
# Create some targets which can be depended on to be "active" after cert renewals
|
2024-08-29 22:46:31 +00:00
|
|
|
finishedTargets = lib.mapAttrs' (cert: conf: lib.nameValuePair "acme-finished-${cert}" {
|
2020-12-13 20:22:33 +00:00
|
|
|
wantedBy = [ "default.target" ];
|
2021-11-26 21:39:06 +00:00
|
|
|
requires = [ "acme-${cert}.service" ];
|
|
|
|
after = [ "acme-${cert}.service" ];
|
2020-12-13 20:22:33 +00:00
|
|
|
}) certConfigs;
|
|
|
|
|
|
|
|
# Create targets to limit the number of simultaneous account creations
|
2021-01-09 19:15:03 +00:00
|
|
|
# How it works:
|
|
|
|
# - Pick a "leader" cert service, which will be in charge of creating the account,
|
|
|
|
# and run first (requires + after)
|
|
|
|
# - Make all other cert services sharing the same account wait for the leader to
|
|
|
|
# finish before starting (requiredBy + before).
|
|
|
|
# Using a target here is fine - account creation is a one time event. Even if
|
|
|
|
# systemd clean --what=state is used to delete the account, so long as the user
|
|
|
|
# then runs one of the cert services, there won't be any issues.
|
2024-08-29 22:46:31 +00:00
|
|
|
accountTargets = lib.mapAttrs' (hash: confs: let
|
2020-12-13 20:22:33 +00:00
|
|
|
leader = "acme-${(builtins.head confs).cert}.service";
|
|
|
|
dependantServices = map (conf: "acme-${conf.cert}.service") (builtins.tail confs);
|
2024-08-29 22:46:31 +00:00
|
|
|
in lib.nameValuePair "acme-account-${hash}" {
|
2020-12-13 20:22:33 +00:00
|
|
|
requiredBy = dependantServices;
|
|
|
|
before = dependantServices;
|
|
|
|
requires = [ leader ];
|
|
|
|
after = [ leader ];
|
2024-08-29 22:46:31 +00:00
|
|
|
}) (lib.groupBy (conf: conf.accountHash) (lib.attrValues certConfigs));
|
2020-12-13 20:22:33 +00:00
|
|
|
in finishedTargets // accountTargets;
|
2020-06-19 19:27:46 +00:00
|
|
|
})
|
2015-12-12 15:06:24 +00:00
|
|
|
];
|
2015-12-06 15:55:09 +00:00
|
|
|
|
2016-05-09 05:53:27 +00:00
|
|
|
meta = {
|
2020-04-20 00:36:13 +00:00
|
|
|
maintainers = lib.teams.acme.members;
|
2023-01-24 23:33:40 +00:00
|
|
|
doc = ./default.md;
|
2016-05-09 05:53:27 +00:00
|
|
|
};
|
2015-12-06 15:55:09 +00:00
|
|
|
}
|