While using libredirect in conjunction with geckodriver, I stumbled on
odd segfaults that happened when running the wrapped statx() call from
libredirect:
0x00007ffff7ddd541 in __strncmp_avx2 () from .../lib/libc.so.6
0x00007ffff7f6fe57 in statx () from .../lib/libredirect.so
0x00005555558d35bd in std::sys::unix::fs::try_statx::h2045d39b0c66d4e8 ()
0x00005555558d2230 in std::sys::unix::fs::stat::ha063998dfb361520 ()
0x0000555555714019 in mozversion::firefox_version::hdc3b57eb04947426 ()
0x00005555556a603c in geckodriver::capabilities::FirefoxCapabilities::version::h58e289917bd3c721 ()
0x00005555556a77f5 in <geckodriver::capabilities::FirefoxCapabilities as webdriver::capabilities::BrowserCapabilities>::validate_custom::h62d23cf9fd63b719 ()
0x000055555562a7c8 in webdriver::capabilities::SpecNewSessionParameters::validate::h60da250d33f0989f ()
0x00005555556d7a13 in <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold::h9427a360a3d0bf8f ()
0x0000555555669d85 in <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::hd274d536ea29bb33 ()
0x00005555555c05ef in core::iter::adapters::try_process::hdf96a01ec1f9b8bd ()
0x000055555561768d in <webdriver::capabilities::SpecNewSessionParameters as webdriver::capabilities::CapabilitiesMatching>::match_browser::hfbd8c38f6db17e9f ()
0x00005555555ca6ef in <geckodriver::marionette::MarionetteHandler as webdriver::server::WebDriverHandler<geckodriver::command::GeckoExtensionRoute>>::handle_command::h13b98b9cb87a69d6 ()
0x00005555555e859e in webdriver::server::Dispatcher<T,U>::run::h746a8bf2f0bc24fd ()
0x000055555569ff0f in std::sys_common::backtrace::__rust_begin_short_backtrace::h3b920773bd467d2a ()
0x00005555555dbc99 in core::ops::function::FnOnce::call_once{{vtable.shim}}::h81ba7228877515f7 ()
0x00005555558d31a3 in std::sys::unix:🧵:Thread:🆕:thread_start::h4514580219a899c5 ()
0x00007ffff7d0ce24 in start_thread () from .../lib/libc.so.6
0x00007ffff7d8e9b0 in clone3 () from .../lib/libc.so.6
The reason why I found this odd was because it happens in the following
piece of code (shortened a bit):
1 static const char * rewrite(const char * path, char * buf)
2 {
3 if (path == NULL) return path;
4 for (int n = 0; n < nrRedirects; ++n) {
5 int len = strlen(from[n]);
6 if (strncmp(path, from[n], len) != 0) continue;
7 if (snprintf(buf, PATH_MAX, "%s%s", to[n], path + len) >= PATH_MAX)
8 abort();
9 return buf;
10 }
11 return path;
12 }
When inspecting the assembly, I found that the check for the null
pointer in line 3 was completely missing and the code was directly
entering the loop and then eventually segfault when running strncmp()
with a null pointer as its first argument.
I confirmed that indeed that check was missing by compiling libredirect
with "-O0" and comparing the generated assembly with the optimized one.
The one compiled with "-O0" had that check while the optimized one did
not and indeed when running geckodriver with the unoptimized version it
worked fine.
Digging in the Git history, I found 5677ce2008,
which actually introduced the null pointer check. Going back to that
commit however, the check actually was still in the generated assembly.
So I bisected between that commit and the most recent one and ended up
with commit ca8aa5dc87, which moved
everything to use GCC 7.
I haven't found out why *exactly* GCC was optimizing the check away, but
playing around on Godbolt with various other compilers seems that other
compilers such as Clang are doing it as well. Additionally, given that
passing NULL to stat() is UB, my guess is that compilers tend to assume
that such an argument can't be NULL. My assumption is based on the fact
that GCC warns with "argument 1 null where non-null expected" when
passing NULL to eg. stat().
To address this for now, I marked the path argument of the rewrite()
volatile and also added a test that should cause a segfault in case this
would regress again as it already did.
Signed-off-by: aszlig <aszlig@nix.build>
Derivations built with `writeShellScriptBin`
should always be runnable with `nix run`. At present,
the derivation is missing both `meta.mainProgram`
and `pname`– this means that `nix run` falls back
to inferring the bin path from `name`. This is
unreliable and depends on faulty heuristics.
For context, reference the following snippet from
`nix run --help`:
If installable evaluates to a derivation, it will try to execute the
program <out>/bin/<name>, where out is the primary output store path
of the derivation, and name is the first of the following that exists:
· The meta.mainProgram attribute of the derivation.
· The pname attribute of the derivation.
· The name part of the value of the name attribute of the derivation.
This is very useful in conjunction with meta.pkgConfigModules, as the
new tester can use the list provided by this meta attribute as a default
value for moduleNames, making its usage in passthru.tests very
convenient.
For backwards compatibility, a shim under the old name is maintained
with a warning.
vcunat said
> This invocation of mktemp creates the file in the current directory, which is bad practice. We should add "--tmpdir=$TMPDIR" or make the template absolute.
> I noticed because one package did cd $src during installing, which is a read-only path...
The Darwin stdenv rework conditionally sets `NIX_CC_USE_RESPONSE_FILE`
depending on the `ARG_MAX` of the build system. If it is at least 1 MiB,
the stdenv passes the arguments on the command-line (like Linux).
Otherwise, it falls back to the response file. This was done to prevent
intermitent failures with clang 16 being unable to read the response
file. Unfortunately, this breaks `gccStdenv` on older Darwin platforms.
Note: While the stdenv logic will also be reverted, this change is
needed for compatibility with clang 16.
GCC is capable of using a response file, but it does not work correctly
when the response file is a file descriptor. This can be reproduced
using the following sequence of commands:
$ nix shell nixpkgs#gcc; NIX_CC_USE_RESPONSE_FILE=1 gcc
# Linux
/nix/store/9n9gjvzci75gp2sh1c4rh626dhizqynl-binutils-2.39/bin/ld: unrecognized option '-B/nix/store/vnwdak3n1w2jjil119j65k8mw1z23p84-glibc-2.35-224/lib/'
/nix/store/9n9gjvzci75gp2sh1c4rh626dhizqynl-binutils-2.39/bin/ld: use the --help option for usage information
collect2: error: ld returned 1 exit status
# Darwin
ld: unknown option: -mmacosx-version-min=11.0
collect2: error: ld returned 1 exit status
Instead of using process substitution, create a temporary file and
remove it in a trap. This should also prevent the intermitent build
failures with clang 16 on older Darwin systems.
Fixes#245167
Before the change the hook had a chance to run `strip` against the same
file using multiple link paths. In case of `gcc` `libgcc.a` was stripped
multiple times in parallel and produces corrupted archive.
The change runs inputs via `realpath | uniq` to make sure we don't
attempt to strip the same files multiple times.