- prometheus exporters are now configured with
`services.prometheus.exporters.<name>`
- the exporters are now defined by attribute sets
from which the options for each exporter are generated
- most of the exporter definitions are used unchanged,
except for some changes that should't have any impact
on the functionality.
Alertmanager 0.13.0 doesn't support single dash long options, so '-config.file'
for example is parsed as '-c', which leads to the service not starting.
apps.plugin requires capabilities for full process monitoring. with
1.9.0, netdata allows multiple directories to search for plugins and the
setuid directory can be specified here.
the module is backwards compatible with older configs. a test is
included that verifies data gathering for the elevated privileges. one
additional attribute is added to make configuration more generic than
including configuration in string form.
These packages will be placed into an environment using
`backendsToPackages`. This function explicitly maps backends to
`pkgs.nodePackages.${type}` unless it's a builtin. This ensures that only
valid backends that work on NixOS are used (if not, the build already
breaks at evaluation time).
The log will be redirected to `stdout` to be able to watch the entire
output using `journalctl`.
Configuration parameters for the backends need to be set using
`services.statsd.extraConfig` as each backend has its own options and
all of them shouldn't be validated and checked explicitly and manually.
The munin-node service used wrapProgram to inject environment variables.
This doesn't work because munin plugins depend on argv[0], which is
overwritten when the executable is a script with a shebang line (example
below).
This commit removes the wrappers and instead passes the required
environment variables to munin-node.
Eliminating the wrappers resulted in some broken plugins, e.g., meminfo
and hddtemp_smartctl. That was fixed with the per-plugin configuration.
Example:
The plugin if_eth0 is a symlink to /.../plugins/if_, which uses $0
to determine that it should monitor traffic on the eth0 interface.
if_ is a wrapped program, and runs `exec -a "$0" .if_-wrapped`
.if_-wrapped has a "#!/nix/.../bash" line, which results in bash
changing $0, and as a result the plugin thinks my interface
is called "-wrapped".
The behaviour have changed again. Listed collectors are now enabled in
addition to the default one.
Also run as DynmicUser instead of user nobody as the exporter doesn't need
any state.
* prometheus-collectd-exporter service: init module
Supports JSON and binary (optional) protocol
of collectd.
* nixos/prometheus-collectd-exporter: submodule is not needed for collectdBinary
"Builder called die: Cannot wrap
/nix/store/XXX-munin-available-plugins/plugin.sh because it is not an
executable file"
[Bjørn: Keep DRY, quote "$file".]
* removed pid-file support, it is needless to run collectd as systemd service
* removed static user id, as all the files reowned on the service start
* added ambient capabilities for ping and smart (hdd health) functions
While systemd suggests using the pre-defined graphical-session user
target, I found that this interface is difficult to use. Additionally,
no other major distribution, even in their unstable versions, currently
use this mechanism.
The window or desktop manager is supposed to run in a systemd user service
which activates graphical-session.target and the user services that are
binding to this target. The issue is that we can't elegantly pass the
xsession environment to the window manager session, in particular
whereas the PassEnvironment option does work for DISPLAY, it for some
mysterious reason won't for PATH.
This commit implements a new graphical user target that works just like
default.target. Services which should be run in a graphical session just
need to declare wantedBy graphical.target. The graphical target will be
activated in the xsession before executing the window or display manager.
Fixes#17858.
to /etc/dd-agent/conf.d by default, and make sure
/etc/dd-agent/conf.d is used.
Before NixOS 17.03, we were using dd-agent 5.5.X which
used configuration from /etc/dd-agent/conf.d
In NixOS 17.03 the default conf.d location is first used relative,
meaning that $out/agent/conf.d was used without NixOS overrides.
This change implements similar functionality as PR #25288, without
breaking backwards compatibility.
(cherry picked from commit 77c85b0ecb)
Adds services.longview.{apiKeyFile,mysqlPasswordFile} options as
alternatives to apiKey and mysqlPassword, which still work, but are
deprecated with a warning message.
Related to #24288.
The reason being less mental overhead when reading upstream
documentation. Examples can be pasted right into the configuration
instead of translating to Nix attrset first.
The current default value of listenAddress = null blows up:
$ nixos-rebuild build
error: cannot coerce null to a string, at
.../nixpkgs/nixos/modules/services/monitoring/prometheus/alertmanager.nix:97:16
With listenAddress = "" we use the same default as upstream and there is
no blow up :-)
...by providing a default value of "no labels" (an empty attrset).
Without this change we get
$ nixos-rebuild test -I nixpkgs=.
building Nix...
building the system configuration...
error: The option `services.prometheus.scrapeConfigs.[definition 1-entry 1].static_configs.[definition 1-entry 1].labels' is used but not defined.
which is unneeded, because labels _are_ optional.
The structured options are incomplete compared to upstream and I think
it will be a maintenance burden to try to keep up. Instead, provide an
option for the raw config file contents (prometheus.yml).
The collectd service runs as an unprivileged user by default, so it does
not leak more information to its data directory than any user can obtain
elsewhere by other means.
If people are running it as root and are worried about information leak,
we can add collectd group and set perms to 750.
CC @offlinehacker.
Fixes#21198.
* influxdb module: add postStart
* cadvisor module: increase TimeoutStartSec
Under high load, the cadvisor module can take longer than the default 90
seconds to start. This change should hopefully fix the test on Hydra.
Systemd upstream provides targets for networking. This also includes a target network-online.target.
In this PR I remove / replace most occurrences since some of them were even wrong and could delay startup.
Every period, sa1 collects and stores data.
Every 24 hours, sa2 aggregates the previous day's data in to a
report.
Timers and unit configurations were lifted from Fedora's default
units.
This commit implements the changes necessary to start up a graphite carbon Cache
with twisted and start the corresponding graphiteWeb service.
Dependencies need to be included via python buildEnv to include all recursive
implicit dependencies.
Additionally cairo is a requirement of graphiteWeb and pycairo is not a standard
python package (buildPythonPackage) and therefore cannot be included via
buildEnv. It also needs cairo in the Library PATH.
Accidentally broken by 4fede53c09
("nixos manuals: bring back package references").
Without this fix, grafana won't start:
$ systemctl status grafana
...
systemd[1]: Starting Grafana Service Daemon...
systemd[1]: Started Grafana Service Daemon.
grafana[666]: 2016/03/06 19:57:32 [log.go:75 Fatal()] [E] Failed to detect generated css or javascript files in static root (%!s(MISSING)), have you executed default grunt task?
systemd[1]: grafana.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: grafana.service: Unit entered failed state.
systemd[1]: grafana.service: Failed with result 'exit-code'.
This reverts most of 89e983786a, as those references are sanitized now.
Fixes#10039, at least most of it.
The `sane` case wasn't fixed, as it calls a *function* in pkgs to get
the default value.
- add missing types in module definitions
- add missing 'defaultText' in module definitions
- wrap example with 'literalExample' where necessary in module definitions
This reverts most of 89e983786a, as those references are sanitized now.
Fixes#10039, at least most of it.
The `sane` case wasn't fixed, as it calls a *function* in pkgs to get
the default value.
The advantage of putting the PID file under the ephemeral /run is that
when the machine crashes /run gets cleared allowing graphite to start
once the machine is rebooted.
We also set the PIDFile systemd option so that systemd knows the correct
PID and enables systemd to remove the file after service shut down.
* package statsd node packages separatly since they actually require
nodejs-0.10 or nodejs-0.12 to work (which is ... well old)
* remove statsd packages and its backends from "global" node-packages.json.
i did not rebuild it since for some reason npm2nix command fails. next time
somebody will rerun npm2nix statsd packages are going to be removed.
* statsd service: backends are now provided as strings and not anymore as
packages.
The most complex problems were from dealing with switches reverted in
the meantime (gcc5, gmp6, ncurses6).
It's likely that darwin is (still) broken nontrivially.
Now it generates notifications for auto-detected devices as well as
for explicitly configured ones, sends well formed e-mails and supports
immediate `wall` and `xmessage` notifications.
Better replace the double quotes in 'echo "${commands}"' with single
quotes, to prevent the shell from doing command substitution etc. at
configuration build time.
I'm not sure what exactly this user is needed for, i.e. under what circumstances
it must exist or not, but creating it unconditionally seems like the wrong thing
to do. I complained to @offlinehacker about this on Github, but got no response
for a week or so. I'm disabling the extraUsers bit to put out the fire, and now
hope that someone who actually knows about Graphite implements a proper solution
later.
Should bring most of the examples into a better consistency regarding
syntactic representation in the manual.
Thanks to @devhell for reporting.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
All activation scripts run in serial upon boot and nixos-rebuild switch
etc., in contrast to preStart which run before a service starts, and can
run in parallel with other services.
The munin(-node) activation script is particularly slow. Change it to a
preStart script so that it can run in parallel with other services and
not slow down boot (or nixos-rebuild switch).
This reduces (repeated) "nixos-rebuild test" time from ~16 seconds to ~8
on my (old) laptop.
- Upgrade Nagios Core to 4.x
- Expose mainConfigFile and cgiConfigFile in module for finer
configuration control.
- Upgrade Plugins to 2.x
- Remove default objectDefs, which users probably want to customize.
- Systemd-ify Nagios module and simplify directory structure
- Upgrade Nagios package with more modern patch, and ensure the
statedir is set to /var/lib/nagios
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Currently, the restartTriggers are abusing the systemd unit file in that
the cfg.carbon.config/storageAggregation/... option text is pasted into
the unit file. Even though this sort-of works (the service is restarted
if the config changes) this causes systemd to print error messages about
invalid sections (rightfully so!).
The correct use of restartTriggers is to list storage paths, which is
what this change does. If any of the
cfg.carbon/config/storageAggregation/... options change, configDir will
get a new hash. It is not as "fine grained" as the current version, but
it is not abusing the interface.
Also, remove unneeded 'waitress' in one of the restartTriggers, because
it is already listed as part of the service config.
graphitePort must point to the port that carbon-cache listens on, not
the graphite webUI port.
With this change I finally got data from statsd to graphite.
It's "aggregation" with two 'g's.
Fixes this:
carbon-cache[9363]: [console] /nix/store/drxq4jj92sjk3cjik2l4hnsndbray3i4-graphite-config/storage-aggregation.conf not found, ignoring.
This overhauls the Datadog module a bit to be much more useful. In
particular, it adds support for nginx and postgresql monitoring
integrations to dd-agent. These have to exist in separate files under
/etc/dd-agent, so the module just exposes then as separate options. In
the future, more integrations could be added this way.
In the process of doing this, I also had to rename the dd-agent user to
datadog. Note the UIDs did not change, so this is strictly backwards
compatible. The reason for this is to make it easier to create a
'datadog' postgres user with access to pg_stats, as 'dd-agent' typically
isn't a valid username. This allows the out of the box configurations to
be used.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
mkdir -m will only set the permissions if it *creates* the directory.
Existing directories, with possibly wrong permissions, will not be
updated.
Use explicit chmod so permissions will always be correct.
The preStart snippets (graphite, carbon) try to create directories under
/var/db/. That currently fails because the code is run as user
"graphite". Fix by setting "PermissionsStartOnly = true" so that the
preStart stuff is run as 'root'.
Further:
* graphite-web-0.9.12/bin/build-index.sh needs perl, so add it to PATH.
* Now that preStart runs as root, we must wait with "chown graphite"
until we're done creating files/directories.
* Drop needless check for root (uid 0) before running chown.
Using pkgs.lib on the spine of module evaluation is problematic
because the pkgs argument depends on the result of module
evaluation. To prevent an infinite recursion, pkgs and some of the
modules are evaluated twice, which is inefficient. Using ‘with lib’
prevents this problem.
To be compatible with eb2f44c18c (Generate
/etc/passwd and /etc/group at build time). Without this you'll get this:
$ nixos-rebuild build
[...]
user-thrown exception: The option `users.extraGroups.unnamed-9.1.gid' is used but not defined.
(systemd service descriptions that is, not service descriptions in "man
configuration.nix".)
Capitalizing each word in the description seems to be the accepted
standard.
Also shorten these descriptions:
* "Munin node, the agent process" => "Munin Node"
* "Planet Venus, an awesome ‘river of news’ feed reader" => "Planet Venus Feed Reader"
Twisted provides option to log with syslog, this enables nicer logging.
Imagine what happens in a case of exception. If logs are written to stdout,
traceback won't be merged thus giving ugly logs. This commit fixes that.
This is also one of the official ways of starting carbon, so no worries.