Commit Graph

63 Commits

Author SHA1 Message Date
yukang
3f27e4b3ea clean up potential_query_instability with FxIndexMap and UnordMap 2024-02-14 18:36:37 +08:00
Matthias Krüger
5a2cec2615
Rollup merge of #120602 - klensy:mono-comment, r=nnethercote
rustc_monomorphize: fix outdated comment in partition

`max_cgu_count` was removed in 51821515b3, but not comment (usage in `merge_codegen_units` was removed earlier).

r? `@nnethercote`
2024-02-06 19:40:07 +01:00
Michael Goulet
ca44416023 Fix drop shim for AsyncFnOnce closure, AsyncFnMut shim for AsyncFn closure 2024-02-06 02:22:58 +00:00
Michael Goulet
427896dd7e Construct body for by-move coroutine closure output 2024-02-06 02:22:58 +00:00
Michael Goulet
fc4fff4038 Build a shim to call async closures with different AsyncFn trait kinds 2024-02-06 02:22:58 +00:00
klensy
17f0919e8a rustc_monomorphize: fix outdated comment in partition 2024-02-03 14:50:24 +03:00
Nadrieril
e8d1c2ef9c
Rollup merge of #118811 - EbbDrop:is-sorted-by-bool, r=Mark-Simulacrum
Use `bool` instead of `PartiolOrd` as return value of the comparison closure in `{slice,Iteraotr}::is_sorted_by`

Changes the function signature of the closure given to `{slice,Iteraotr}::is_sorted_by` to return a `bool` instead of a `PartiolOrd` as suggested by the libs-api team here: https://github.com/rust-lang/rust/issues/53485#issuecomment-1766411980.

This means these functions now return true if the closure returns true for all the pairs of values.
2024-01-21 06:38:35 +01:00
EbbDrop
606eeb84ad Use bool instead of PartiolOrd in is_sorted_by 2024-01-20 21:38:34 +01:00
Nicholas Nethercote
3c4f1d85af Rename {create,emit}_warning as {create,emit}_warn.
For consistency with `warn`/`struct_warn`, and also `{create,emit}_err`,
all of which use an abbreviated form.
2024-01-10 07:33:06 +11:00
Nicholas Nethercote
8a9db25459 Remove more Session methods that duplicate DiagCtxt methods. 2023-12-24 08:17:47 +11:00
Nicholas Nethercote
99472c7049 Remove Session methods that duplicate DiagCtxt methods.
Also add some `dcx` methods to types that wrap `TyCtxt`, for easier
access.
2023-12-24 08:05:28 +11:00
bors
1559dd2dbf Auto merge of #118770 - saethlin:fix-inline-never-uses, r=nnethercote
Fix cases where std accidentally relied on inline(never)

This PR increases the power of `-Zcross-crate-inline-threshold=always` so that it applies through `#[inline(never)]`. Note that though this is called "cross-crate-inlining" in this case especially it is _just_ lazy per-CGU codegen. The MIR inliner and LLVM still respect the attribute as much as they ever have.

Trying to bootstrap with the new `-Zcross-crate-inline-threshold=always` change revealed two bugs:

We have special intrinsics `assert_inhabited`, `assert_zero_valid`, and `assert_mem_uniniitalized_valid` which codegen backends will lower to nothing or a call to `panic_nounwind`.  Since we may not have any call to `panic_nounwind` in MIR but emit one anyway, we need to specially tell `MirUsedCollector` about this situation.

`#[lang = "start"]` is special-cased already so that `MirUsedCollector` will collect it, but then when we make it cross-crate-inlinable it is only assigned to a CGU based on whether `MirUsedCollector` saw a call to it, which of course we didn't.

---

I started looking into this because https://github.com/rust-lang/rust/pull/118683 revealed a case where we were accidentally relying on a function being `#[inline(never)]`, and cranking up cross-crate-inlinability seems like a way to find other situations like that.

r? `@nnethercote` because I don't like what I'm doing to the CGU partitioning code here but I can't come up with something much better
2023-12-15 04:54:14 +00:00
Ben Kimock
e559172249 Fix cases where std accidentally relied on inline(never) 2023-12-14 08:30:36 -05:00
Lukasz Anforowicz
981c4e3ce6 Add unstable -Zdefault-hidden-visibility cmdline flag for rustc.
The new flag has been described in the Major Change Proposal at
https://github.com/rust-lang/compiler-team/issues/656
2023-12-13 21:14:23 +00:00
Nilstrieb
21a870515b Fix clippy::needless_borrow in the compiler
`x clippy compiler -Aclippy::all -Wclippy::needless_borrow --fix`.

Then I had to remove a few unnecessary parens and muts that were exposed
now.
2023-11-21 20:13:40 +01:00
Zalathar
6f1ca8d9eb coverage: Change query codegened_and_inlined_items to a plain function
This query has a name that sounds general-purpose, but in fact it has
coverage-specific semantics, and (fortunately) is only used by coverage code.

Because it is only ever called once (from one designated CGU), it doesn't need
to be a query, and we can change it to a regular function instead.
2023-10-21 12:20:05 +11:00
lcnr
3c52a3e280 subst -> instantiate 2023-09-26 09:37:55 +02:00
Deadbeef
a0a801cd38 treat host effect params as erased generics in codegen
This fixes the changes brought to codegen tests when effect params are
added to libcore, by not attempting to monomorphize functions that get
the host param by being `const fn`.
2023-09-14 07:34:35 +00:00
Matthias Krüger
c3cd05198a
Rollup merge of #113872 - nnethercote:tweak-cgu-sorting, r=pnkfelix
Tweak CGU sorting in a couple of places.

In `base.rs`, tweak how the CGU size interleaving works. Since #113777, it's much more common to have multiple CGUs with identical sizes. With the existing code these same-sized items ended up in the opposite-to-desired order due to the stable sorting. The code now starts with a reverse sort (like is done in `partitioning.rs`) which gives the behaviour we want. This doesn't matter much for perf, but makes profiles in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of `sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is an identical CGU sort earlier in that function that already uses `sort_by_key`.)

r? `@pnkfelix`
2023-07-27 06:04:12 +02:00
Matthias Krüger
af2b370100 more clippy::style fixes:
get_first
single_char_add_str
unnecessary_mut_passed
manual_map
manual_is_ascii_check
2023-07-23 23:39:04 +02:00
Matthias Krüger
ed4c5fef72 fix some clippy::style findings
comparison_to_empty
iter_nth_zero
for_kv_map
manual_next_back
redundant_pattern
2023-07-23 23:36:56 +02:00
Nicholas Nethercote
8c31219d5c Tweak CGU sorting in a couple of places.
In `base.rs`, tweak how the CGU size interleaving works. Since #113777,
it's much more common to have multiple CGUs with identical sizes. With
the existing code these same-sized items ended up in the
opposite-to-desired order due to the stable sorting. The code now starts
with a reverse sort (like is done in `partitioning.rs`) which gives the
behaviour we want. This doesn't matter much for perf, but makes profiles
in `samply` look more like what we expect.

In `partitioning.rs`, we can use `sort_by_key` instead of
`sort_by_cached_key` because `CGU::size_estimate()` is cheap. (There is
an identical CGU sort earlier in that function that already uses
`sort_by_key`.)
2023-07-20 09:58:13 +10:00
Nicholas Nethercote
05de5d6f64 Change the primary CGU merging algorithm.
Instead of repeatedly merging the two smallest CGUs, we now use a
merging algorithm that aims to minimize the duplication of inlined
functions.

`exa-0.10.1` was one benchmark that saw particularly good results. The
old CGU stats:
```
INTERNALIZE
- unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined)
- placed items: 3834 (1216 root + 2618 inlined), placed size: 154552 (77219 root + 77333 inlined)
- placed/unique items ratio: 1.38, placed/unique size ratio: 1.27
- CGUs: 16, mean size: 9659.5, sizes: [11791, 11634, 11173, 10987, 10939, 10507, 9992, 9813, 9593, 9580, 9030, 8447, 7975, 7961, 7876, 7254]
```
The new CGU stats:
```
INTERNALIZE
- unique items: 2774 (1216 root + 1558 inlined), unique size: 122065 (77219 root + 44846 inlined)
- placed items: 3626 (1216 root + 2410 inlined), placed size: 147201 (77219 root + 69982 inlined)
- placed/unique items ratio: 1.31, placed/unique size ratio: 1.21
- CGUs: 16, mean size: 9200.1, sizes: [11634, 10939, 10227, 9555, 9178, 9167, 8879, 8804, 8604, 8603 (x3), 8602 (x2), 8601, 8600]
```
The difference is in the number of inlined items. There are 1558 unique
inlined items. With the old algorithm these were placed 2618 times,
resulting in 1060 duplicates. With the new algorithm these were placed
2410 times, resulting in 852 duplicates. Also, the mean CGU size dropped
from 9659.5 to 9200.1, and the CGU size distribution tightened, with the
biggest one a little smaller and the smallest ones a little bigger.
2023-07-19 07:23:11 +10:00
Nicholas Nethercote
b2c3948892 Split the CGU merging loop.
It has two conditions. This commit splits it in two, one per condition.
The next commit will change the first loop.
2023-07-19 07:23:11 +10:00
Nicholas Nethercote
77b053a2dd Add MonoItemData::inlined. 2023-07-19 07:23:09 +10:00
Nicholas Nethercote
87c509da95 Ignore unreachable inlined items in debug_dump.
They're quite rare, and ignoring them simplifies things quite a bit, and
further reduces the number of calls to `MonoItem::size_estimate` to the
number of placed items (one per root item, and one or more per reachable
inlined item).
2023-07-17 08:44:48 +10:00
Nicholas Nethercote
edd1f3827e Store item size estimate in MonoItemData.
This means we call `MonoItem::size_estimate` (which involves a query)
less often: just once per mono item, and then once more per inline item
placement. After that we can reuse the stored value as necessary. This
means `CodegenUnit::compute_size_estimate` is cheaper.
2023-07-17 08:44:48 +10:00
Nicholas Nethercote
b52f9eb6ca Introduce MonoItemData.
It replaces `(Linkage, Visibility)`, making the code nicer. Plus the
next commit will add another field.
2023-07-17 08:44:48 +10:00
Mahdi Dibaiee
e55583c4b8 refactor(rustc_middle): Substs -> GenericArg 2023-07-14 13:27:35 +01:00
Matthias Krüger
d5b1ef98b0
Rollup merge of #113390 - nnethercote:cgu-tweaks, r=wesleywiser
CGU formation tweaks

Minor improvements I found while trying out something bigger that didn't work out.

r? ``@wesleywiser``
2023-07-08 15:49:46 +02:00
Nicholas Nethercote
fc8536669c Diagnose unsorted CGUs.
An assertion failure was reported in #112946. This extra information
will help diagnose the problem.
2023-07-06 18:27:25 +10:00
Nicholas Nethercote
3078e4d804 Minor comment fix. 2023-07-06 11:07:22 +10:00
Nicholas Nethercote
b51169c178 Remove the field name from MonoItemPlacement::SingleCgu.
It's needless verbosity.
2023-07-06 10:35:57 +10:00
Nicholas Nethercote
22d4c798ec Use iter() instead of iter_mut() in one place. 2023-07-06 10:35:57 +10:00
Nicholas Nethercote
142075a9fb Make UsageMap::get_user_items infallible.
It's nicer this way.
2023-07-06 10:35:57 +10:00
Nicholas Nethercote
666b1b68a7 Tweak thread names for CGU processing.
For non-incremental builds on Unix, currently all the thread names look
like `opt regex.f10ba03eb5ec7975-cgu.0`. But they are truncated by
`pthread_setname` to `opt regex.f10ba`, hiding the numeric suffix that
distinguishes them. This is really annoying when using a profiler like
Samply.

This commit changes these thread names to a form like `opt cgu.0`, which
is much better.
2023-06-26 09:14:45 +10:00
Nicholas Nethercote
487bdeb519 Improve ordering and naming of CGUs for non-incremental builds.
Currently there are two problems.

First, the CGUS don't end up in size order. The merging loop does sort
by size on each iteration, but we don't sort after the final merge, so
typically there is one CGU out of place. (And sometimes we don't enter
the merging loop at all, in which case they end up in random order.)

Second, we then assign names that differ only by a numeric suffix, and
then we sort them lexicographically by name, giving us an order like
this:

regex.f10ba03eb5ec7975-cgu.1
regex.f10ba03eb5ec7975-cgu.10
regex.f10ba03eb5ec7975-cgu.11
regex.f10ba03eb5ec7975-cgu.12
regex.f10ba03eb5ec7975-cgu.13
regex.f10ba03eb5ec7975-cgu.14
regex.f10ba03eb5ec7975-cgu.15
regex.f10ba03eb5ec7975-cgu.2
regex.f10ba03eb5ec7975-cgu.3
regex.f10ba03eb5ec7975-cgu.4
regex.f10ba03eb5ec7975-cgu.5
regex.f10ba03eb5ec7975-cgu.6
regex.f10ba03eb5ec7975-cgu.7
regex.f10ba03eb5ec7975-cgu.8
regex.f10ba03eb5ec7975-cgu.9

These two problems are really annoying when debugging and profiling the
CGUs.

This commit ensures CGUs are sorted by name *and* reverse sorted by
size. This involves (a) one extra sort by size operation, and (b)
padding the numeric indices with zeroes, e.g.
`regex.f10ba03eb5ec7975-cgu.01`.

(Note that none of this applies for incremental builds, where a
different hash-based CGU naming scheme is used.)
2023-06-26 09:14:11 +10:00
Nicholas Nethercote
abde9ba527 Tweak CGU size estimate code.
- Rename `create_size_estimate` as `compute_size_estimate`, because that
  makes more sense for the second and subsequent calls for each CGU.
- Change `CodegenUnit::size_estimate` from `Option<usize>` to `usize`.
  We can still assert that `compute_size_estimate` is called first.
- Move the size estimation for `place_mono_items` inside the function,
  for consistency with `merge_codegen_units`.
2023-06-22 09:33:06 +10:00
Nicholas Nethercote
105ac1c26d Merge root and inlined item placement.
There's no longer any need for them to be separate, and putting them
together reduces the amount of code.
2023-06-22 08:10:29 +10:00
Nicholas Nethercote
6f228e3420 Inline before merging CGUs.
Because CGU merging relies on CGU sizes, but the CGU sizes before
inlining aren't accurate.

This requires tweaking how the sizes are updated during merging: if CGU
A and B both have an inlined function F, then `size(A + B)` will be a
little less than `size(A) + size(B)`, because `A + B` will only have one
copy of F. Also, the minimum CGU size is increased because it now has to
account for inlined functions.

This change doesn't have much effect on compile perf, but it makes
follow-on changes that involve more sophisticated reasoning about CGU
sizes much easier.
2023-06-22 08:10:29 +10:00
Nicholas Nethercote
f6cadae163 Streamline some comments. 2023-06-22 08:10:29 +10:00
Nicholas Nethercote
2af5f2276d Merge CGUs in a nicer way. 2023-06-15 18:58:23 +10:00
Nicholas Nethercote
e414d25e94 Make partition more consistent.
Always put the `create_size_estimate` calls and `debug_dump` calls
within a timed scopes. This makes the four main steps look more similar
to each other.
2023-06-15 10:39:39 +10:00
Nicholas Nethercote
57a7c8f577 Fix bug in mark_code_coverage_dead_code_cgus.
The comment says "Find the smallest CGU that has exported symbols and
put the dead function stubs in that CGU". But the code sorts the CGUs by
size (smallest first) and then searches them in reverse order, which
means it will find the *largest* CGU that has exported symbols.

The erroneous code was introduced in #92142.

This commit changes it to use a simpler search, avoiding the sort, and
fixes the bug in the process.
2023-06-15 10:39:04 +10:00
Nicholas Nethercote
9d7295f0be Move dead CGU marking code out of partition.
The other major steps in `partition` have their own function, so it's
nice for this one to be likewise.
2023-06-15 10:02:13 +10:00
Nicholas Nethercote
7c3ce02a11 Introduce a minimum CGU size in non-incremental builds.
Because tiny CGUs make compilation less efficient *and* result in worse
generated code.

We don't do this when the number of CGUs is explicitly given, because
there are times when the requested number is very important, as
described in some comments within the commit. So the commit also
introduces a `CodegenUnits` type that distinguishes between default
values and user-specified values.

This change has a roughly neutral effect on walltimes across the
rustc-perf benchmarks; there are some speedups and some slowdowns. But
it has significant wins for most other metrics on numerous benchmarks,
including instruction counts, cycles, binary size, and max-rss. It also
reduces parallelism, which is good for reducing jobserver competition
when multiple rustc processes are running at the same time. It's smaller
benchmarks that benefit the most; larger benchmarks already have CGUs
that are all larger than the minimum size.

Here are some example before/after CGU sizes for opt builds.

- html5ever
  - CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572,
    1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189]
  - CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480]

- libc
  - CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)]
  - CGUs: 1, mean size: 424.0, sizes: [424]

- tt-muncher
  - CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7]
  - CGUs: 1, mean size: 9075.0, sizes: [9075]

Note that CGUs of size 100,000+ aren't unusual in larger programs.
2023-06-14 10:57:44 +10:00
Nicholas Nethercote
95d85899ce Add more measurements to the CGU debug printing.
For example, we go from this:
```
FINAL (4059 items, total_size=232342; 16 CGUs, max_size=39608,
min_size=5468, max_size/min_size=7.2):
- CGU[0] regex.f2ff11e98f8b05c7-cgu.0 (318 items, size=39608):
  - fn ...
  - fn ...
```
to this:
```
FINAL
- unique items: 2726 (1459 root + 1267 inlined), unique size: 201214 (146046 root + 55168 inlined)
- placed items: 4059 (1459 root + 2600 inlined), placed size: 232342 (146046 root + 86296 inlined)
- placed/unique items ratio: 1.49, placed/unique size ratio: 1.15
- CGUs: 16, mean size: 14521.4, sizes: [39608, 31122, 20318, 20236, 16268, 13777, 12310, 10531, 10205, 9810, 9250, 9065 (x2), 7785, 7524, 5468]

- CGU[0]
  - regex.f2ff11e98f8b05c7-cgu.0, size: 39608
  - items: 318, mean size: 124.6, sizes: [28395, 3418, 558, 485, 259, 228, 176, 166, 146, 118, 117 (x3), 114 (x5), 113 (x3), 101, 84, 82, 77, 76, 72, 71 (x2), 66, 65, 62, 61, 59 (x2), 57, 55, 54 (x2), 53 (x4), 52 (x5), 51 (x4), 50, 48, 47, 46, 45 (x3), 44, 43 (x5), 42, 40, 38 (x4), 37, 35, 34 (x2), 32 (x2), 31, 30, 28 (x2), 27 (x2), 26 (x3), 24 (x2), 23 (x3), 22 (x2), 21, 20, 16 (x4), 15, 13 (x7), 12 (x3), 11 (x6), 10, 9 (x2), 8 (x4), 7 (x8), 6 (x38), 5 (x21), 4 (x7), 3 (x45), 2 (x63), 1 (x13)]
  - fn ...
  - fn ...
```
This is a lot more information, distinguishing between root items and
inlined items, showing how much duplication there is of inlined items,
plus the full range of sizes for CGUs and items within CGUs. All of
which is really helpful when analyzing this stuff and trying different
CGU formation algorithms.
2023-06-14 10:15:59 +10:00
Nicholas Nethercote
51821515b3 Remove PartitioningCx::target_cgu_count.
Because that value can be easily obtained from `Partitioning::tcx`.
2023-06-13 16:47:09 +10:00
Nicholas Nethercote
853345635b Move mono_item_placement construction.
It's currently created in `place_inlined_mono_items` and then used in
`internalize_symbols`. This commit moves the creation to
`internalize_symbols`.
2023-06-07 11:02:15 +10:00
Nicholas Nethercote
1defd30764 Remove PlacedRootMonoItems::roots.
It's no longer used.
2023-06-07 10:27:00 +10:00