Integrate Bloop with Pants/Bazel

What is the future of build tools such as Pants and Bazel in the Scala community? Can we accelerate the adoption rate by integrating these tools with the existing tooling ecosystem?

Introduction to basic definitions

Bloop is a Scala build server that compiles, tests and runs Scala fast.

Pants and Bazel are scalable language-agnostic build systems. They support Scala and also need to compile, test and run Scala fast.

Why would we want Bazel and Pants to integrate with Bloop? How could such an integration work given the seemingly competing goals of both tools?

This article answers these questions and summarizes my excellent discussions with Natan (contributor to the Bazel Scala rules @ Wix) and Stu, Danny and Win (core maintainers of Pants @ Twitter) during Scala Days 2019.

Motivation

There are three main arguments to motivate the integration.

#1: Straight-forward integration with editors

Adoption of build tools is limited by how well they integrate with existing developer tools. For example, how well you can use them from an editor.

Currently, Pants and Bazel only support IntelliJ IDEA via their custom IDEA plugins. These integrations are difficult to build, test and maintain.

Bloop provides Bazel and Pants a quick way to integrate with the vast majority of editors used in the Scala community: IntelliJ IDEA via the built-in BSP support in intellij-scala and VS Code, Vim, Emacs, Sublime Text 2, and Atom via Metals.

The integration is easy to build, test and maintain, it relieves build tool maintainers from implementing specific editor support for users and allows sharing improvements in editors support across build tools.

#2: Faster local developer workflows

Bazel and Pants promise reproducible builds. Reproducibility is a key property of builds. It gives developers confidence to deploy their code and reason about errors and bugs.

To make compilation reproducible, incremental compilation is disabled and build requests trigger a full compilation of a target foo every time:

One of the build inputs of foo is modified (such as a source)
Users ask for build tool diagnostics or a compilation of foo

A best practice in Bazel and Pants is to create fine-grained build targets of a handful of sources. Fine-grained build targets help reduce the overhead of full compiles: they compile faster, increase parallel compilations and enable incremental compilation at the build target level.

However, even under ideal compilation times of 1 or 2 seconds per compiled build target, there are scenarios where instant feedback cannot be achieved:

Language servers such as Metals that forward diagnostics from Bazel and Pants will take 1 or 2 seconds at best to act on diagnostics, making the slowdown noticeable to users.
- Metals also needs class files/semanticdb files to provide a rich editing experience (go to definition, find all references).
Common scenarios such as changing a binary API can trigger many compilations downstream that take a long time to finish, slowing down even more the build diagnostics in the editor.

An integration with Bloop speeds up local developer workflows by allowing local build clients (such as editors) to trigger incremental compiles while isolating these compiles completely from Bazel or Pants.

In practice, this means build clients such as Metals can use Bloop to receive build diagnostics fast (in the order of 50-100ms) and collect class files in around 400-500ms, meaning developers feel instant feedback from the build tool.

And Bloop guarantees compilation requests from Bazel and Pants will:

Trigger a full compile per build target (same output for same input)
Never conflict with other client actions
Can be reused by clients that want fast, incremental compiles

(These guarantees are unlocked by the latest Bloop v1.3.2 release.)

Integrating with Bloop brings Bazel/Pants users the best of both "worlds":

Bazel and Pants can still offer reproducible builds to users with no cache pollution. The cache engine in Bazel and Pants only gets to "see" class files produced by full compilations.
Developers sensitive to slow feedback in the editor can opt-in for incremental compiles from their editor in a local machine. In case of rare incremental errors, they can trigger a compilation from the build tool manually to restore a clean state.
Developers that don't want to compromise build reproducibility to get faster workflows can enable a Bloop configuration setting to keep using full compiles from their editor, while still getting faster compiles than they would if they used the compilation engine from Pants or Bazel.

#3: State-of-the-art compilation engine

Currently, the Scala rules in Bazel and Pants implement their own compilation engine, interface directly with internal Scala compiler APIs and have a high memory and resource usage footprint because they spawn a JVM server that cannot be reused by external build clients.

The advantages of using Bloop to compile Scala code are the following:

Speed. Bloop implements a compilation engine that:
1. is the fastest to this date
2. has been tweaked to have the best performance defaults
3. uses build pipelining to speed up full build graph compilations
4. is benchmarked in 20 of the biggest Scala open source projects
5. is continuosly improved and maintained by compiler engineers
Supports pure compilation. Bloop can recompile build targets from scratch if it's told to do so by the build tools.
Minimal use of resources. Bloop can be reused by any local build client, including those from other build tools and workspaces.
Lack of maintenance. The compilation engine doesn't need to be maintained by neither the Bazel nor the Pants team.
Simple integration. The integration is done via the Build Server Protocol, which requires only a few hundred lines of code and is decoupled from any change in the compiler binary APIs.

How to integrate with Bloop

There are several ways to integrate Bloop and Pants/Bazel with varying degrees of functionality.

Which integration is the best ultimately depends on what clients want/don't want to give up and what are the key motivations behind the integration. The move from one integration oto another one can be done gradually.

Barebone integration: only generating `.bloop/`

Bloop loads a build by reading Bloop configuration files from a .bloop/ directory placed in the root workspace directory.

A configuration file is a JSON file that aggregates all of the build inputs Bloop needs to compile, test and run. It is written to a directory in the file system to simplify access and caching when the build tool is not running but other clients are. Every time a configuration file in this directory changes, the Bloop server automatically reloads its build state.

A barebone integration is the simplest Bloop integration: Pants or Bazel generate Bloop configuration files to a .bloop directory. Whenever there is a change in a build target, Bazel or Pants regenerate its configuration file again.

Here's a diagram illustrating the barebone integration.

Note that:

There are several clients talking to Bloop manned by developers
The build tool and Bloop use different compilers/state
Bazel/Pants write configuration files, Bloop only reads them
.bloop is the workspace directory where files are persisted

Pros

Easy to prototype (Danny and I implemented it in Pants in 4 hours)
Out-of-the-box integration with Metals and CLI (motivation #1)

Cons

Requires writing all configurations to a .bloop/ in the workspace.
The Bloop compiles are not integrated with those of the build tool. This implies that this solution doesn't satisfy users that want:
- A faster developer workflow (motivation #2)
- A state-of-the-art compilation engine (motivation #3)
because the build tool and Bloop use their own compilers.

BSP integration: generating `.bloop/` and talking BSP

To enable a solution that not only provides the possibility of using Bazel/Pants from any editor but also has a faster developer workflow than the status quo, we need to look at ways we can enable Bloop to do the heavy-lifting of compilation.

In a way, Bazel and Pants become build clients to the BSP build server in Bloop:

A compile in Bazel or Pants maps to a compile request to Bloop
Bazel and Pants receive compilation logs and class files from Bloop

The following diagram illustrates how the architecture looks like:

We can see that Bazel / Pants no longer own compilers and that they instead communicate with the Bloop server via BSP. To implement that, the build tools can use bsp4j, a tiny Java library that implements the protocol and allows the client to listen to all results/notifications from the build tool.

There are, however, different ways Bazel or Pants can offload compilation to Bloop. Let's illustrate both of them with a simple build.

The straight-forward mechanism to offload compilation is to let the Bazel/Pants build tool drive the compilation itself.

Upon the first compilation of a target C, the build tool would:

Make sure there is an open BSP connection with the Bloop server.
- If not, the Bloop Launcher will start it
Visit C, find dependency B is not compiled.
Visit B, find dependency A is not compiled.
Visit A, no more dependencies, then:
1. Generate configuration file for A
2. Send Bloop compile request for A to write class files
Come back to B, no more uncompiled dependencies, then:
1. Generate configuration file for B
2. Send Bloop compile request for B to write class files
Come back to C, no more uncompiled dependencies, then:
1. Generate configuration file for C
2. Send Bloop compile request for C to write class files

(The build tool can safely visit a build graph in parallel.)

This mechanism works if one wants the build tool to own and control the way compilations are run, but it's slower than letting Bloop compile a subset of the build graph on its own, where Bloop can (among other actions):

Start the compilation of a project before its dependencies are finished (e.g. start compiling B right after A is typechecked). This is the so-called build pipelining.
Compile faster by populating symbols from in-memory stores instead of reading class files from the file system.
Amortize the cost of starting a compilation by compiling a list of build targets at the same time.

The build tool could benefit from all of these actions by just changing how it maps compilation requests to the Bloop BSP server:

Make sure there is an open BSP connection with the Bloop server.
- If not, the Bloop Launcher will start it
Visit C, find dependency B has no config.
Visit B, find dependency A has no config.
Visit A, no more dependencies, then generate config for A
Come back to B, no more dependencies, generate config for B
Come back to C, no more dependencies, generate config for C
Send a Bloop compile request from C.
- Bloop will start compiling the build graph in the background.
- After building a target, Bloop sends a notification to client.
Visit B, find dependency A is not compiled.
Visit A, wait for Bloop's end notification for A.
Come back to B, wait for Bloop's end notification for B.
Come back to C, wait for Bloop's end notification for C.

Right after receiving the notifications from the server, the build tool will find all the compilation products written in the classes directory specified in the configuration file. Meaning the build tool can immediately start evaluating tasks that depend on compilation products for that project.

(Sbt will offload compilation to Bloop by following this strategy in the next Bloop release.)

Pros

Out-of-the-box integration with Metals and CLI (motivation #1)
A faster local developer workflow (motivation #2)
A state-of-the-art compilation engine that compiles the build graph as fast as possible for the build tool, with a simple protocol that decouples the build tools from compiler internals

Cons

Not as straight-forward to implement as the first shallow integration, but still doable and abstracted away from compiler internals.
Requires writing all configurations to a .bloop/ in the workspace.

Manual binary dependency

It is possible (but discouraged) to use Bloop's compilation engine via a library dependency and interface directly with Bloop internal compiler APIs. However, most of the nice performance advantages of using Bloop will be lost as those are implemented in how the schedluling of build targets is implemented.

Pros

Can yield some compile speedups if the internals are used correctly

Cons

No out-of-the-box integration with Metals and other clients (motivation #1)
Same local developer workflow as now (motivation #2)
Difficult to implement and maintain (similar situation as the status quo)
- Bloop compiler APIs change frequently
- Bloop compiler APIs do not promise binary compatibility

CI compatibility

The CI doesn't pose any integration problems for Bloop. When Bazel runs compilation in the build farm, the Bloop Launcher will open a connection with a Bloop server and start compiling, in a similar way to how the current rules Scala in Bazel or Pants work.

Conclusion

This document motivates an integration with Bloop, explains why build tools such as Pants and Bazel would like to integrate with it and what are the consequences to their users.

This document intentionally goes into not only ideas but also implementation details to show how a full end-to-end integration from Bazel or Pants to Bloop is possible and can be implemented. Despite a few minor improvements missing in the latest Bloop release, build tool engineers could implement an integration that works tomorrow while solving fundamental problems present today.

Integrate Bloop with Pants/Bazel

Introduction to basic definitions

Motivation

#1: Straight-forward integration with editors

#2: Faster local developer workflows

#3: State-of-the-art compilation engine

How to integrate with Bloop

Barebone integration: only generating .bloop/

Pros

Cons

BSP integration: generating .bloop/ and talking BSP

Pros

Cons

Manual binary dependency

Pros

Cons

CI compatibility

Conclusion

Barebone integration: only generating `.bloop/`

BSP integration: generating `.bloop/` and talking BSP