nixpkgs/pkgs/development/interpreters
Greg Price 480c8d1991 cpython: Optimize dynamic symbol tables, for a 6% speedup.
I took a close look at how Debian builds the Python interpreter,
because I noticed it ran substantially faster than the one in nixpkgs
and I was curious why.

One thing that I found made a material difference in performance was
this pair of linker flags (passed to the compiler):

    -Wl,-O1 -Wl,-Bsymbolic-functions

In other words, effectively the linker gets passed the flags:

    -O1 -Bsymbolic-functions

Doing the same thing in nixpkgs turns out to make the interpreter
run about 6% faster, which is quite a big win for such an easy
change.  So, let's apply it.

---

I had not known there was a `-O1` flag for the *linker*!
But indeed there is.

These flags are unrelated to "link-time optimization" (LTO), despite
the latter's name.  LTO means doing classic compiler optimizations
on the actual code, at the linking step when it becomes possible to
do them with cross-object-file information.  These two flags, by
contrast, cause the linker to make certain optimizations within the
scope of its job as the linker.

Documentation is here, though sparse:
  https://sourceware.org/binutils/docs-2.31/ld/Options.html

The meaning of -O1 was explained in more detail in this LWN article:
  https://lwn.net/Articles/192624/
Apparently it makes the resulting symbol table use a bigger hash
table, so the load factor is smaller and lookups are faster.  Cool.

As for -Bsymbolic-functions, the documentation indicates that it's a
way of saving lookups through the symbol table entirely.  There can
apparently be situations where it changes the behavior of a program,
specifically if the program relies on linker tricks to provide
customization features:
  https://bugs.launchpad.net/ubuntu/+source/xfe/+bug/644645
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637184#35
But I'm pretty sure CPython doesn't permit that kind of trick: you
don't load a shared object that tries to redefine some symbol found
in the interpreter core.

The stronger reason I'm confident using -Bsymbolic-functions is
safe, though, is empirical.  Both Debian and Ubuntu have been
shipping a Python built this way since forever -- it was introduced
for the Python 2.4 and 2.5 in Ubuntu "hardy", and Debian "lenny",
released in 2008 and 2009.  In those 12 years they haven't seen a
need to drop this flag; and I've been unable to locate any reports
of trouble related to it, either on the Web in general or on the
Debian bug tracker.  (There are reports of a handful of other
programs breaking with it, but not Python/CPython.)  So that seems
like about as thorough testing as one could hope for.

---

As for the performance impact: I ran CPython upstream's preferred
benchmark suite, "pyperformance", in the same way as described in
the previous commit.  On top of that commit's change, the results
across the 60 benchmarks in the suite are:

The median is 6% faster.

The middle half (aka interquartile range) is from 4% to 8% faster.

Out of 60 benchmarks, 3 come out slower, by 1-4%.  At the other end,
5 are at least 10% faster, and one is 17% faster.

So, that's quite a material speedup!  I don't know how big the
effect of these flags is for other software; but certainly CPython
tends to do plenty of dynamic linking, as that's how it loads
extension modules, which are ubiquitous in the stdlib as well as
popular third-party libraries.  So perhaps that helps explain why
optimizing the dynamic linker has such an impact.
2020-05-13 21:24:30 -07:00
..
acl2
angelscript
bats
ceptre
chibi
clips
clisp
clojure clojure 1.10.1.492 -> 1.10.1.507 plus bugfix (#79868) 2020-02-12 11:50:50 +00:00
clojurescript/lumo cleanup 2020-02-19 23:40:14 +01:00
dart dart: 2.0.0 -> 2.7.1 (stable) + 2.0.0 -> 2.8.0-dev.10.0 (dev) 2020-02-27 14:23:27 +01:00
dhall Add Nixpkgs support for Dhall 2020-02-11 22:02:53 -08:00
duktape duktape: 2.4.0 -> 2.5.0 2019-12-08 21:47:09 +01:00
eff Treewide: fix more URL permanent redirects 2019-11-21 15:37:34 -08:00
elixir elixir_1_10: 1.10.1 -> 1.10.2 2020-02-26 13:12:21 +01:00
erlang erlangR22: 22.1.7 -> 22.3 2020-03-17 06:54:06 +01:00
evcxr evcxr: upgrade cargo fetcher and cargoSha256 2020-02-15 22:09:05 -08:00
falcon
gauche gauche: 0.9.8 -> 0.9.9 2019-12-16 20:20:20 -05:00
gnu-apl treewide: NIX_*_FLAGS -> string 2019-12-31 00:15:46 +01:00
groovy groovy: 2.5.9 -> 3.0.0 2020-02-12 12:34:24 +00:00
gtk-server gtk-server: 2.3.1 -> 2.4.5 2019-12-15 13:31:53 -08:00
guile guile: 2.2.6 -> 2.2.7 2020-03-11 23:56:38 +00:00
hugs
hy hy: use python2, build fails with 3 2019-11-13 16:27:38 +01:00
icon-lang icon-lang: fix build 2020-02-23 17:22:13 +01:00
io treewide: Get rid of libGLU_combined 2019-11-18 20:10:43 +00:00
j
janet janet: 1.6.0 -> 1.7.0 2020-02-07 10:54:34 +00:00
jimtcl treewide: NIX_*_FLAGS -> string 2019-12-31 00:15:46 +01:00
joker Revert "Merge pull request #83099 from marsam/fix-buildGoModule-packages-darwin" 2020-03-27 07:33:21 +00:00
jruby jruby: 9.2.10.0 -> 9.2.11.0 2020-03-05 02:42:20 +00:00
jython jython: 2.7.2b2 -> 2.7.2b3 2020-02-23 13:41:27 -08:00
kona treewide: replace make/build/configure/patchFlags with nix lists 2019-12-30 12:58:11 +01:00
lfe lfe: 1.2.1 -> 1.3 2020-02-10 20:03:47 +01:00
lolcode
love treewide: NIX_*_FLAGS -> string 2019-12-31 00:15:46 +01:00
lua-5 buidLuarocksPackage: add a checkPhase 2020-02-26 01:14:30 +01:00
luajit luajit: Expose build options, enable JIT debug module 2020-02-15 18:40:02 +01:00
lush treewide: Get rid of libGLU_combined 2019-11-18 20:10:43 +00:00
maude maude: update from version 2.7.1 to 3.0 (including full-maude) 2020-01-31 16:25:40 +01:00
metamath metamath: 0.180 -> 0.181 2020-03-14 03:24:44 +00:00
micropython micropython: init at 1.12 2020-01-03 10:57:55 +01:00
mujs
nix-exec
octave Build octave on macos 2020-03-23 06:31:11 +00:00
perl perl: Enable threading on darwin 2020-02-20 08:35:45 +01:00
php php: get rid of with lib; used on entire file 2020-03-15 18:04:57 +01:00
picoc
picolisp picolisp: 19.6 -> 19.12 2020-01-06 22:25:42 -08:00
pixie
proglodyte-wasm
pure pure: mark as broken 2020-01-30 18:35:30 -05:00
pyrex
python cpython: Optimize dynamic symbol tables, for a 6% speedup. 2020-05-13 21:24:30 -07:00
qnial
quickjs treewide: Remove myself from maintainers on some packages (#78027) 2020-01-19 12:18:34 -05:00
racket racket: enable building on aarch64 2020-03-16 15:23:31 +01:00
rakudo rakudo: 2020.02 -> 2020.02.1 2020-03-27 02:00:17 -07:00
rascal
rebol
red
regina
renpy treewide: Get rid of libGLU_combined 2019-11-18 20:10:43 +00:00
ruby ruby_2_4: remove 2020-02-10 13:23:35 -05:00
scheme48
scsh
self
spidermonkey treewide: replace make/build/configure/patchFlags with nix lists 2019-12-30 12:58:11 +01:00
supercollider supercollider: 3.10.3 -> 3.10.4 2020-01-30 11:03:45 +00:00
tcl tcl: fix dangling symlink 2019-12-19 09:46:36 -05:00
tinyscheme
unicon-lang
wasmer wasmer: 0.13.0 -> 0.16.2 2020-03-12 19:06:47 +01:00
wasmtime wasmtime: bump to v0.12.0 (from v0.8.0) 2020-03-01 02:34:43 +01:00