• 0 Posts
  • 4 Comments
Joined 1 year ago
cake
Cake day: July 25th, 2023

help-circle


  • MoSal@lemm.eetoRust Programming@lemmy.mlDid you try Mold?
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Okay. I updated mold to v2.0.0. Added "-Z", "time-passes" to get link times, ran cargo with --timings to get CPU utilization graphs. Tested on two projects of mine (the one from yesterday is “X”).

    Link times are picked as the best from 3-4 runs, changing only white space on main.rs.

    lto="fat" lld mold
    project X (cu=1) 105.923 106.380
    Project X (cu=8) 103.512 103.513
    Project S (cu=1) 94.290 94.969
    Project S (cu=8) 100.118 100.449

    Observations (lto="fat"): As expected, not a lot of utilization of multi-core. Using codegen-units larger than 1 may even cause a regression in link time. Choice of linker between lld and mold appears to be of no significance.


    lto="thin" lld mold
    project X (cu=1) 46.596 47.118
    Project X (cu=8) 34.167 33.839
    Project X (cu=16) 36.296 36.621
    Project S (cu=1) 41.817 41.404
    Project S (cu=8) 32.062 32.162
    Project S (cu=16) 35.780 36.074

    Observations (lto="thin"): Here, we see parallel LLVM_lto_optimize runs kicking in. Testing with codegen-units=16 was also done. In that case, the number of parallel LLVM_lto_optimize runs was so big, the synchronization overhead caused a regression running that test on a humble workstation powered by an Intel i7-7700K processor (4 physical, 8 logical cores only). The results will probably look different running this test case (cu=16) in a more powerful setup. But still, the choice of linker between lld and mold appears to be of no significance.


    lto=false lld mold
    project X (cu=1) 29.160 29.231
    Project X (cu=8) 8.130 8.293
    Project X (cu=16) 7.076 6.953
    Project S (cu=1) 11.996 12.069
    Project S (cu=8) 4.418 4.462
    Project S (cu=16) 4.357 4.455

    Observations (lto=false): Here, codegen-units becomes the dominant factor with no heavy LLVM_lto_optimize runs involved. Going above codegen-units=8 does not hurt link time. Still, the choice of linker between lld and mold appears to be of no significance.


    lto="off" lld mold
    project X (cu=1) 29.109 29.201
    Project X (cu=8) 5.896 6.117
    Project X (cu=16) 3.479 3.637
    Project S (cu=1) 11.732 11.742
    Project S (cu=8) 2.354 2.355
    Project S (cu=16) 1.517 1.499

    Observations (lto="off"): Same observations as lto=false. Still, the choice of linker between lld and mold appears to be of no significance.


    Debug builds link in <.4 seconds.


  • MoSal@lemm.eetoRust Programming@lemmy.mlDid you try Mold?
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    1 year ago

    codegen-units=1, debug=true, varying lto

    lto = "fat"

    Flags Clean build time Pre-strip size Post-strip size
    (default) 2:31 90.8207MiB 7.3374MiB
    ["-Z", "gcc-ld=lld"] 2:31 91.9731MiB 7.3332MiB
    linker = "clang" 2:32 90.8207MiB 7.3375MiB
    linker = "clang"; fuse-ld="mold" 2:31 92.1107MiB 7.3334MiB

    lto = "thin"

    Flags Clean build time Pre-strip size Post-strip size
    (default) 1:33 96.9630MiB 8.1695MiB
    ["-Z", "gcc-ld=lld"] 1:32 98.3889MiB 8.1777MiB
    linker = "clang" 1:33 96.9631MiB 8.1695MiB
    linker = "clang"; fuse-ld="mold" 1:32 98.6903MiB 8.1797MiB

    lto = false

    Flags Clean build time Pre-strip size Post-strip size
    (default) 1:32 113.5656MiB 8.0601MiB
    ["-Z", "gcc-ld=lld"] 1:30 115.1210MiB 8.1122MiB
    linker = "clang" 1:32 113.5656MiB 8.0602MiB
    linker = "clang"; fuse-ld="mold" 1:31 115.4679MiB 8.0663MiB

    lto = "off"

    Flags Clean build time Pre-strip size Post-strip size
    (default) 1:33 113.5666MiB 8.0601MiB
    ["-Z", "gcc-ld=lld"] 1:31 115.1231MiB 8.1122MiB
    linker = "clang" 1:32 113.5667MiB 8.0602MiB
    linker = "clang"; fuse-ld="mold" 1:31 115.4697MiB 8.0662MiB

    codegen-units=8, debug=true, varying lto

    lto = "fat"

    Flags Clean build time Pre-strip size Post-strip size
    (default) 2:21 104.9842MiB 7.6304MiB
    ["-Z", "gcc-ld=lld"] 2:19 106.1436MiB 7.6264MiB
    linker = "clang" 2:21 104.9882MiB 7.6344MiB
    linker = "clang"; fuse-ld="mold" 2:19 106.2864MiB 7.6325MiB

    lto = "thin"

    Flags Clean build time Pre-strip size Post-strip size
    (default) 1:12 134.1112MiB 9.0445MiB
    ["-Z", "gcc-ld=lld"] 1:09 136.1897MiB 9.0660MiB
    linker = "clang" 1:12 134.1113MiB 9.0446MiB
    linker = "clang"; fuse-ld="mold" 1:09 136.4466MiB 9.0494MiB

    lto = false

    Flags Clean build time Pre-strip size Post-strip size
    (default) 1:14 158.1049MiB 9.0328MiB
    ["-Z", "gcc-ld=lld"] 1:11 159.9998MiB 9.1129MiB
    linker = "clang" 1:14 158.1050MiB 9.0328MiB
    linker = "clang"; fuse-ld="mold" 1:12 160.3123MiB 9.0428MiB

    lto = "off"

    Flags Clean build time Pre-strip size Post-strip size
    (default) 0:57 145.9463MiB 9.4586MiB
    ["-Z", "gcc-ld=lld"] 0:54 148.6021MiB 9.6001MiB
    linker = "clang" 0:57 145.9464MiB 9.4587MiB
    linker = "clang"; fuse-ld="mold" 0:55 148.8842MiB 9.4668MiB

    mold appears to be similar but not faster than lld.

    With the caveat that this is not a proper benchmark since:

    • I didn’t measure link time alone.
    • I didn’t bother running each case multiple times picking the fastest run (since I perceived the differences to be insignificant).

    And a side note, lto = false appears to be practically useless.