Bazel collect2: fatal error: cannot find 'ld' with build
The issue at hand stems from the linker ld.lld
, which is specified by the -fuse-ld=lld
option, not being present in the PATH during the binary linking process. The error message “cannot find ’ld’” can be misleading; it does not refer to /usr/bin/ld
, but rather indicates a failure to locate ld.lld
. The resolution involves adding the directory containing ld.lld
to your PATH, either by overriding the PATH variable directly or by using --host_action_env=PATH=xxx
.
Note: The PATH variable will not work if the --incompatible_strict_action_env
option is enabled.
Our CI system is reporting a build error for protoc
that states: collect2: fatal error: cannot find 'ld' with build
. This is perplexing because we have never encountered this error on our development machines. Moreover, we cannot reproduce the error by running the command bazel build xxx
manually on CI machine or development machine, and ld
is indeed present in /usr/bin
. This raises the question: why does Bazel indicate that it cannot find ld
? There is an existing issue on the Bazel GitHub repository, but it does not provide a solution.
Initially, I suspected that the presence of -fuse-ld=lld
in the parameter file was the source of a Bazel bug. However, upon examining Bazel’s source code more closely, I realized my assumption was incorrect. The -fuse-ld=lld
option is derived from link_flags
in local_config_cc/BUILD
. The template for local_config_cc/BUILD
can be found at @rules_cc//cc/private/toolchain:BUILD.tpl, while its instantiation occurs in unix_cc_configure.bzl. The setting of link_flags
is determined by the following code snippet:
gold_or_lld_linker_path = (
_find_linker_path(repository_ctx, cc, "lld", is_clang) or
_find_linker_path(repository_ctx, cc, "gold", is_clang)
)
cc_path = repository_ctx.path(cc)
if not str(cc_path).startswith(str(repository_ctx.path(".")) + "/"):
# cc is outside the repository, set -B
bin_search_flags = ["-B" + escape_string(str(cc_path.dirname))]
else:
# cc is inside the repository, don't set -B.
bin_search_flags = []
if not gold_or_lld_linker_path:
ld_path = repository_ctx.path(tool_paths["ld"])
if ld_path.dirname != cc_path.dirname:
bin_search_flags.append("-B" + str(ld_path.dirname))
force_linker_flags = []
if gold_or_lld_linker_path:
force_linker_flags.append("-fuse-ld=" + gold_or_lld_linker_path)
The detection logic within the _find_linker_path
function is executed by running the command gcc xxx.cc -Wl,--start-lib -Wl,--end-lib -fuse-ld=<linker> -v
.
result = repository_ctx.execute([
cc,
str(repository_ctx.path("tools/cpp/empty.cc")),
"-o",
"/dev/null",
"-Wl,--start-lib",
"-Wl,--end-lib",
"-fuse-ld=" + linker,
"-v",
])
In my development environment, this command executes successfully. However, when I run it on the CI machine, it fails, returning the error collect2: fatal error: cannot find ’ld’ with build. This leads me to question why gcc
cannot find ld
, despite it being located in /usr/bin
.
I tested the command both without and with -fuse-ld=lld
, and it failed only in the latter case. This caused me to speculate that the issue lies not with ld
, but rather with lld
being unfindable. To further investigate, I issued the command with -Wl,-verbose
and confirmed that using -fuse-ld=lld
indeed invokes ld.lld
during the linking phase.
I even experimented with creating a symbolic link for ld.lld
on the CI machine, which resolved the issue. Additionally, adding ld.lld
to the PATH worked as well. These tests confirmed that specifying -fuse-ld=lld
requires ld.lld
to be present in the PATH to avoid the cannot find 'ld'
error.
Another pressing question is why Bazel reported an error even though I set the PATH using --action_env=PATH=xxx
. Moreover, changing the PATH before running Bazel also yielded the same error. The protoc
configuration does not possess any special values; you can see it here.
cc_binary(
name = "protoc",
copts = COPTS,
linkopts = LINK_OPTS,
visibility = ["//visibility:public"],
deps = ["//src/google/protobuf/compiler:protoc_lib"],
)
From the CI error message, I noted the [for tool]
suffix:
Linking external/com_google_protobuf/protoc [for tool] failed: (Exit 1): gcc failed: error executing command (from target @com_google_protobuf//:protoc) /usr/bin/gcc @bazel-out/k8-opt-exec-2B5CBBC6/bin/external/com_google_protobuf/protoc-2.params
The only instance where we utilize protoc
is as follows:
cc_proto_library(
name = "echo_cc_library",
srcs = ["echo_message.proto"],
cc_libs = ["@com_google_protobuf//:protobuf"],
visibility = ["//visibility:public"],
)
proto_gen = rule(
attrs = {
"protoc": attr.label(
cfg = "host",
executable = True,
allow_single_file = True,
mandatory = True,
),
},
)
The protoc
is defined as @com_google_protobuf//:protoc
.
Note: There is a warning message indicating that cfg = "host"
in attribute definitions should be replaced by cfg = "exec"
(buildifier(attr-cfg)).
After reviewing the documentation regarding attr.label
Configuration of the attribute. It can be either “exec”, which indicates that the dependency is built for the execution platform, or “target”, which indicates that the dependency is build for the target platform. A typical example of the difference is when building mobile apps, where the target platform is Android or iOS while the execution platform is Linux, macOS, or Windows. This parameter is required if executable is True to guard against accidentally building host tools in the target configuration. “target” has no semantic effect, so don’t set it when executable is False unless it really helps clarify your intentions.
In general, sources, dependent libraries, and executables that will be needed at runtime can use the same configuration. Tools that are executed as part of the build (such as compilers or code generators) should be built for an exec configuration. In this case, specify cfg = “exec” in the attribute. Otherwise, executables that are used at runtime (such as as part of a test) should be built for the target configuration. In this case, specify cfg = “target” in the attribute.
Bazel recognizes three roles that a platform may serve:
- Host - the platform on which Bazel itself runs.
- Execution - a platform on which build tools execute build actions to produce intermediate and final outputs.
- Target - a platform on which a final output resides and executes.
Bazel supports the following build scenarios regarding platforms:
- Single-platform builds (default) - host, execution, and target platforms are the same. For example, building a Linux executable on Ubuntu running on an Intel x64 CPU.
- Cross-compilation builds - host and execution platforms are the same, but the target platform is different. For example, building an iOS app on macOS running on a MacBook Pro.
- Multi-platform builds - host, execution, and target platforms are all different.
I can’t recall how I stumbled upon the --host_action_env
option; it may have simply been through searching for the term action_env
on the Bazel official website.
The --host_action_env
option
Specifies the set of environment variables available to actions with execution configurations. Variables can be either specified by name, in which case the value will be taken from the invocation environment, or by the name=value pair which sets the value independent of the invocation environment. This option can be used multiple times; for options given for the same variable, the latest wins, options for different variables accumulate.
This option is particularly relevant as protoc
is built with execution configurations. Hence, utilizing this option with --host_action_env
will be beneficial in resolving the environment variable issue.
I have created a demo to reproduce this issue; you may find it at this repository and CI logs.
# demo.bzl
def _impl(ctx):
ctx.actions.run(
inputs = [],
outputs = [ctx.outputs.output_file],
arguments = [ctx.outputs.output_file.path],
executable = ctx.executable._tool,
)
return
demo_rule = rule(
implementation = _impl,
attrs = {
"_tool": attr.label(
executable = True,
cfg = "exec",
allow_files = True,
default = Label("//:demo"),
),
"output_file": attr.output(mandatory = True),
},
)
# BUILD.bazel
load(":demo.bzl", "demo_rule")
cc_binary(
name = "demo",
srcs = ["main.cc"],
# linkopts = ["-Wl,-verbose"],
)
demo_rule(
name = "tool",
output_file = "demo_output",
)
Q&A
Q: Why is -fuse-ld=lld
used when compiling with GCC?
A: Bazel automatically detects the available linker during the initialization of local_config_cc
. The relevant code can be found here.
Bazel first checks whether lld
is available by executing gcc xxx.cc -Wl,--start-lib -Wl,--end-lib -fuse-ld=lld -v
. If lld
is not found, it checks for ld.gold
with the command gcc xxx.cc -Wl,--start-lib -Wl,--end-lib -fuse-ld=gold -v
. You can review the source for this process here.
Therefore, if ld.lld
is present in your PATH, local_config_cc
will register that lld
is being used.
Q: Why does a binary linking error occur even when lld
is specified in the PATH using --action_env=PATH=xxx
?
A: The tool we are using, such as protoc
, is built with the following configuration:
"_tool": attr.label(
executable = True,
cfg = "exec",
allow_files = True,
mandatory = True,
),
The cfg = "exec"
setting indicates that the tool is built for the execution platform. For further details, please refer to the documentation on attr.label.
In Bazel, platforms can be categorized into three roles: Host, Execution, and Target. Generally, we build binaries or libraries for the Target platform, for instance, when executing bazel build :demo
. However, the PATH specified with --action_env=PATH=xxx
applies to the Target platform. If we are building for the Host platform, this PATH will not be effective and will inherit from the shell environment, defaulting to /bin:/usr/bin:/usr/local/bin
if the --incompatible_strict_action_env
option is enabled.