Previously, we hardcoded a 60 second timer to stop netdata if we didn't have any answer back.
This is wrong and can cause data loss because the SIGTERM sent by systemd can sometimes be not honored.
Which in turn becomes a SIGKILL, causing potential data loss / corruption.
Offer a flag to users and bump the deadline to 2 minutes.
and `lib.getExe` allows a safe handling and potential backport of this.
But for that to work it would require 22.11 to set `pkgs.grafana-agent.meta.mainProgram = "agent"`
Relevant upstream release: https://github.com/grafana/agent/releases/tag/v0.31.0
This commit deletes the assertions that were added in 4ec456b. Those
assertions weren't even working to begin with, and they also cause
assertions leak into the generated YAML.
Invoking cadvisor sent the command line parameter `-storage_driver_user` twice, once passing `cfg.storageDriverHost`. Fix the typo and pass the host config option to the command line parameter `-storage_driver_host`
this converts meta.doc into an md pointer, not an xml pointer. since we
no longer need xml for manual chapters we can also remove support for
manual chapters from md-to-db.sh
since pandoc converts smart quotes to docbook quote elements and our
nixos-render-docs does not we lose this distinction in the rendered
output. that's probably not that bad, our stylesheet didn't make use of
this anyway (and pre-23.05 versions of the chapters didn't use quote
elements either).
also updates the nixpkgs manual to clarify that option docs support all
extensions (although it doesn't support headings at all, so heading
anchors don't work by extension).
apparently pandoc has changed behavior over the past releases, so the
files are no longer in sync. occasionally this requires edits
to the markdown source to not remove an anchor that was there
before (albeit wth a very questionable id), or where things were simply
being misrendered due to syntax errors.
This ensures that the CLI is in sync with the service configuration.
(I tried building apcupsd with --sysconfdir=/etc instead, but it wants
to install stuff there at build time, so I backed out.)
Fixes https://github.com/NixOS/nixpkgs/issues/208204.
Setting up the DeviceAllow list with explicitly configured devices was a
nice idea, but sometimes a configured device (`/dev/nvme0n1` an NVMe
namespace) has a parent device (`/dev/nvme0`) that smartctl needs to
access to query metrics.
Falling back to the block and character definitions is probably a valid
fallback.
Closes#198646
* The options `password`/`basicAuthPassword` were removed for
datasources in Grafana 9. The only option to declare them now is to use
`secureJsonData`.
* Fix description for contactPoints provisioning: when using file/env
providers, nothing will be leaked into the store.
* Fix regex in file-provider usage check: it's also possible to either
use `$__env{FOO}` or `$FOO` to fetch secrets from the environment.
* Fix warning for datasources: `password`/`basicAuthPassword` was
removed, also check for each setting in `secureJsonData` if
env/file-provider was used (then no warning is needed!).
The hack with `either` had the side-effect that the sub-options of the
submodule didn't appear in the manual. I decided to remove this because
the "migration" isn't that hard, you just need to fix some module
declarations.
However, `mkRenamedOptionModule` wouldn't work here because it'd create
a "virtual" option for the deprecated path (i.e.
`services.grafana.provision.{datasources,dashboards}`), but that's the
already a new option, i.e. the submodule for the new stuff.
To make sure that you still get errors, I implemented a small hack using
`coercedTo` which throws an error if a list is specified (as it would be
done on 22.05) which explains what to do instead to make the migration
easier.
Also, I linkified the options in the manual now to make it easier to
navigate between those.
See the discussion below the original PR[1] and #197443 for more
context.
I guess I missed that upon review because the branch was too old and I
cherry-picked the commit onto my deployment branch which is based on
22.05. Sorry for that!
[1] https://github.com/NixOS/nixpkgs/pull/162784#issuecomment-1306848036
Netdev collector needs AF_NETLINK permissions to work. It will fail with
the message "couldn't get netstats: socket: address family is not
supported by protocol" otherwise.
This commit refactors `services.grafana.provision.datasources` towards
the RFC42 style. To preserve backwards compatibility, we have to jump
through a ton of hoops, introducing esoteric type signatures and bizarre
structs. The Grafana module definition should hopefully become a lot
cleaner after a release cycle or two once the old configuration style is
completely deprecated.
This commit refactors `services.grafana.provision.dashboards` towards
the RFC42 style. To preserve backwards compatibility, we have to jump
through a ton of hoops, introducing esoteric type signatures and bizarre
structs. The Grafana module definition should hopefully become a lot
cleaner after a release cycle or two once the old configuration style is
completely deprecated.
The current `ExecStart` will not allow for multiple sockets to properly
be passed to the program since the extra newline character is interpreted to
be part of the socket path.
Allow @resources syscalls in the grafana.service unit. While Grafana
itself does not need them, some plugins (incl. first party) crash if
they fail to setrlimit. This was first seen with the official grafana
Clickhouse datasource plugin.
The @resources syscalls set is fairly harmess anyway.
conversions were done using https://github.com/pennae/nix-doc-munge
using (probably) rev f34e145 running
nix-doc-munge nixos/**/*.nix
nix-doc-munge --import nixos/**/*.nix
the tool ensures that only changes that could affect the generated
manual *but don't* are committed, other changes require manual review
and are discarded.
mostly no rendering changes. some lists (like simplelist) don't have an
exact translation to markdown, so we use a comma-separated list of
literals instead.
this mostly means marking options that use markdown already
appropriately and making a few adjustments so they still render
correctly. notable for nftables we have to transform the md links
because the manpage would not render them correctly otherwise.
this notable also now interprets a markdown-flavored list in
triton_sd_config as actual markdown and renders it differently, but this
is arguably for the better (and probably the original intention).
no other rendering changes.
there seems to be a lot of markdown in the prometheus module that
should've been docbook instead. temporarily convert it to docbook to
keep the diff for the docbook->md conversion of prometheus inspectable.
using regular strings works well for docbook because docbook is not as
whitespace-sensitive as markdown. markdown would render all of these as
code blocks when given the chance.
this renders the same in the manpage and a little more clearly in the
html manual. in the manpage there continues to be no distinction from
regular text, the html manual gets code-type markup (which was probably
the intention for most of these uses anyway).
now nix-doc-munge will not introduce whitespace changes when it replaces
manpage references with the MD equivalent.
no change to the manpage, changes to the HTML manual are whitespace only.
make (almost) all links appear on only a single line, with no
unnecessary whitespace, using double quotes for attributes. this lets us
automatically convert them to markdown easily.
the few remaining links are extremely long link in a gnome module, we'll
come back to those at a later date.
our xslt already replaces double line breaks with a paragraph close and
reopen. not using explicit para tags lets nix-doc-munge convert more
descriptions losslessly.
only whitespace changes to generated documents, except for two
strongswan options gaining paragraph two breaks they arguably should've
had anyway.
the conversion procedure is simple:
- find all things that look like options, ie calls to either `mkOption`
or `lib.mkOption` that take an attrset. remember the attrset as the
option
- for all options, find a `description` attribute who's value is not a
call to `mdDoc` or `lib.mdDoc`
- textually convert the entire value of the attribute to MD with a few
simple regexes (the set from mdize-module.sh)
- if the change produced a change in the manual output, discard
- if the change kept the manual unchanged, add some text to the
description to make sure we've actually found an option. if the
manual changes this time, keep the converted description
this procedure converts 80% of nixos options to markdown. around 2000
options remain to be inspected, but most of those fail the "does not
change the manual output check": currently the MD conversion process
does not faithfully convert docbook tags like <code> and <package>, so
any option using such tags will not be converted at all.
For reference:
- ./nixos/modules/services/monitoring/grafana.nix
- 80192f1fe3/debian/service
- 5894b9b77a/trunk/prometheus.service
I have omitted the Limit* as they do not appear to be commonly used in
NixOS, and, per `man systemd.exec`, are less preferred vs. cgroup
limits.
Move the defaults to the `config` section of the module, and apply them
with mkDefault.
That way the defaults are merged with user-provided config, and are
merged without having to use lib.mkForce.
The `bash` binary is needed for running some plugins, notably the alarm notify plugins. If the binary isn't in the path, alarms notifications aren't sent and the netdata error log instead contains `/usr/bin/env: 'bash': No such file or directory`.
Due to lack of maintenance. It is not compatible with the default
Python version (due to the tornado 5) dependency, and doesn't look
like it will be any time soon.
According to https://grafana.com/docs/agent/latest/upgrade-guide/#v0240,
this has been deprecated/moved to -server.http.address and
-server.grpc.address (accepting ip and port) config options in v0.24.0,
and already listens on localhost and not port 80 by default.
According to https://github.com/grafana/agent/pull/1540, -prometheus.*
flages were deprecated in 0.19.0 in favor of the -metrics.*
counterparts. Same applies to `loki` being renamed to `logs`.
I'm not sure if the config file format is still supported (it could be),
but we shouldn't use deprecated configs.
Make secret replacement more robust and futureproof:
- Allow any attribute in `services.parsedmarc.settings` to be a
secret if set to `{ _secret = "/path/to/secret"; }`.
- Hash secret file paths before using them as a placeholders in the
config file to minimize the risk of conflicting file paths being
replaced instead.
The old way of writing the file omited qoutes within strings which are needed by some configurations like federations.
The quotes got lost when `echo`ing the content via `echo '${builtins.toJSON x}'`.
The pkgs.formats.json does handle that race condition properly, so this commit switches the writing to that helper.
Fixes race conditions like this:
> systemd[1]: Started prometheus-kea-exporter.service.
> kea-exporter[927]: Listening on http://0.0.0.0:9547
> kea-exporter[927]: Socket at /run/kea/dhcp4.sock does not exist. Is Kea running?
> systemd[1]: prometheus-kea-exporter.service: Main process exited, code=exited, status=1/FAILURE
When no devices are given the exporter tries to autodiscover available
disks. The previous DevicePolicy was however preventing the exporter
from accessing any device at all, since only explicitly mentioned ones
were allowed.
This commit adds an allow rule for several device classes that I could
find on my machines, that gets set when no devices are explicitly
configured.
There is an existing problem with nvme devices, that expose a character
device at `/dev/nvme0`, and a (namespaced) block device at
`/dev/nvme0n1`. The character device does not come with permissions that
we could give to the exporter without further impacting the hardening.
crw------- 1 root root 247, 0 27. Jan 03:10 /dev/nvme0
brw-rw---- 1 root disk 259, 0 27. Jan 03:10 /dev/nvme0n1
The autodiscovery only finds the character device, which the exporter
unfortunately does not have access to.
However a simple udev rule can be used to resolve this:
services.udev.extraRules = ''
SUBSYSTEM=="nvme", KERNEL=="nvme[0-9]*", GROUP="disk"
'';
Unfortunately I'm not fully aware of the security implications this
change carries and we should question upstream (systemd) why they did
not include such a rule.
The disk group has no members on any of my machines.
❯ getent group disk
disk❌6:
This option makes the complete netdata configuration directory available for
modification. The default configuration is merged with changes
defined in the configDir option.
Co-authored-by: Michael Raitza <spacefrogg-github@meterriblecrew.net>
some options have default that are best described in prose, such as
defaults that depend on the system stateVersion, defaults that are
derivations specific to the surrounding context, or those where the
expression is much longer and harder to understand than a simple text
snippet.
adds defaultText for all options that use `cfg.*` values in their
defaults, but only for interpolations with no extra processing (other
than toString where necessary)
While upgrading my NixOS system I was greeted by this error:
error:
Failed assertions:
- users.users.collectd.group is unset. This used to default to
nogroup, but this is unsafe. For example you can create a group
for this user with:
users.users.collectd.group = "collectd";
users.groups.collectd = {};
Let's fix it.
Otherwise, `postfix_up{path="/var/lib/postfix/queue/public/showq"}` will
always be `0` indicating an postfix outage because this is a unix domain
socket that cannot be connected to:
2021/12/03 14:50:46 Failed to scrape showq socket: dial unix /var/lib/postfix/queue/public/showq: socket: address family not supported by protocol