codegen-units=1
, debug=true
, varying lto
lto = "fat"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 2:31 | 90.8207MiB | 7.3374MiB |
["-Z", "gcc-ld=lld"] |
2:31 | 91.9731MiB | 7.3332MiB |
linker = "clang" |
2:32 | 90.8207MiB | 7.3375MiB |
linker = "clang"; fuse-ld="mold" |
2:31 | 92.1107MiB | 7.3334MiB |
lto = "thin"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:33 | 96.9630MiB | 8.1695MiB |
["-Z", "gcc-ld=lld"] |
1:32 | 98.3889MiB | 8.1777MiB |
linker = "clang" |
1:33 | 96.9631MiB | 8.1695MiB |
linker = "clang"; fuse-ld="mold" |
1:32 | 98.6903MiB | 8.1797MiB |
lto = false
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:32 | 113.5656MiB | 8.0601MiB |
["-Z", "gcc-ld=lld"] |
1:30 | 115.1210MiB | 8.1122MiB |
linker = "clang" |
1:32 | 113.5656MiB | 8.0602MiB |
linker = "clang"; fuse-ld="mold" |
1:31 | 115.4679MiB | 8.0663MiB |
lto = "off"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:33 | 113.5666MiB | 8.0601MiB |
["-Z", "gcc-ld=lld"] |
1:31 | 115.1231MiB | 8.1122MiB |
linker = "clang" |
1:32 | 113.5667MiB | 8.0602MiB |
linker = "clang"; fuse-ld="mold" |
1:31 | 115.4697MiB | 8.0662MiB |
codegen-units=8
, debug=true
, varying lto
lto = "fat"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 2:21 | 104.9842MiB | 7.6304MiB |
["-Z", "gcc-ld=lld"] |
2:19 | 106.1436MiB | 7.6264MiB |
linker = "clang" |
2:21 | 104.9882MiB | 7.6344MiB |
linker = "clang"; fuse-ld="mold" |
2:19 | 106.2864MiB | 7.6325MiB |
lto = "thin"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:12 | 134.1112MiB | 9.0445MiB |
["-Z", "gcc-ld=lld"] |
1:09 | 136.1897MiB | 9.0660MiB |
linker = "clang" |
1:12 | 134.1113MiB | 9.0446MiB |
linker = "clang"; fuse-ld="mold" |
1:09 | 136.4466MiB | 9.0494MiB |
lto = false
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 1:14 | 158.1049MiB | 9.0328MiB |
["-Z", "gcc-ld=lld"] |
1:11 | 159.9998MiB | 9.1129MiB |
linker = "clang" |
1:14 | 158.1050MiB | 9.0328MiB |
linker = "clang"; fuse-ld="mold" |
1:12 | 160.3123MiB | 9.0428MiB |
lto = "off"
Flags | Clean build time | Pre-strip size | Post-strip size |
---|---|---|---|
(default) | 0:57 | 145.9463MiB | 9.4586MiB |
["-Z", "gcc-ld=lld"] |
0:54 | 148.6021MiB | 9.6001MiB |
linker = "clang" |
0:57 | 145.9464MiB | 9.4587MiB |
linker = "clang"; fuse-ld="mold" |
0:55 | 148.8842MiB | 9.4668MiB |
mold
appears to be similar but not faster than lld
.
With the caveat that this is not a proper benchmark since:
- I didn’t measure link time alone.
- I didn’t bother running each case multiple times picking the fastest run (since I perceived the differences to be insignificant).
And a side note, lto = false
appears to be practically useless.
Okay. I updated mold to
v2.0.0
. Added"-Z", "time-passes"
to get link times, ran cargo with--timings
to get CPU utilization graphs. Tested on two projects of mine (the one from yesterday is “X”).Link times are picked as the best from 3-4 runs, changing only white space on
main.rs
.lto="fat"
Observations (
lto="fat"
): As expected, not a lot of utilization of multi-core. Usingcodegen-units
larger than 1 may even cause a regression in link time. Choice of linker betweenlld
andmold
appears to be of no significance.lto="thin"
Observations (
lto="thin"
): Here, we see parallelLLVM_lto_optimize
runs kicking in. Testing withcodegen-units=16
was also done. In that case, the number of parallelLLVM_lto_optimize
runs was so big, the synchronization overhead caused a regression running that test on a humble workstation powered by an Intel i7-7700K processor (4 physical, 8 logical cores only). The results will probably look different running this test case (cu=16) in a more powerful setup. But still, the choice of linker betweenlld
andmold
appears to be of no significance.lto=false
Observations (
lto=false
): Here,codegen-units
becomes the dominant factor with no heavyLLVM_lto_optimize
runs involved. Going abovecodegen-units=8
does not hurt link time. Still, the choice of linker betweenlld
andmold
appears to be of no significance.lto="off"
Observations (
lto="off"
): Same observations aslto=false
. Still, the choice of linker betweenlld
andmold
appears to be of no significance.Debug builds link in <.4 seconds.