Distribute Task Execution (DTE)

Nx speeds up your average CI time with caching and the affected command. But neither of these features help with the worst case scenario. When something at the core of your repo has been modified and every task needs to be run in CI, the only way to improve the performance is by adding more agent jobs and efficiently parallelizing the tasks.

The most obvious way to parallelize tasks is to split tasks up by type: running all tests on one job, all builds on another and all lint tasks on a third. This strategy is called binning. This can be made difficult if some test tasks have build tasks as prerequisites, but assuming you figure out some way to handle that, a typical set up can look like the diagram below. Here the test tasks are delayed until all necessary build artifacts are ready, but the build and lint tasks can start right away.

CI using binning

The problem with the binning approach is you'll end up with some idle time on one or more jobs. Nx's distributed task execution reduces that idle time to the minimum possible by assigning each individual task to agent jobs based on the task's average run time. Nx also guarantees that tasks are executed in the correct order and uses distributed caching to make sure that build artifacts from previous tasks are present on every agent job that needs them.

When you set up Nx's distributed task execution, your task graph will look more like this:

CI using DTE

And not only will CI finish faster, but the debugging experience is the same as if you ran all of your CI on a single job. That's because Nx uses distributed caching to recreate all of the logs and build artifacts on the main job.

Set up

To distribute your task execution, you need to (1) connect to Nx Cloud and (2) enable DTE in your CI workflow. Each of these steps can be enabled with a single command:

nx connect-to-nx-cloud
Use the latest version of Nx Cloud

This command installs the latest version of @nrwl/nx-cloud. The latest version works with any version of Nx >= 13.0.

nx generate @nrwl/workspace:ci-workflow --ci=github

The --ci flag can be github, circleci or azure. For more details on setting up DTE, read this guide.

CI Execution Flow

Distributed task execution can work on any CI provider. You are responsible for launching jobs in your CI system. Nx Cloud then coordinates the way those jobs work together. There are two different kinds of jobs that you'll need to create in your CI system.

  1. One main job that controls what is going to be executed
  2. Multiple agent jobs that actually execute the tasks

The main job execution flow looks like this:

# Coordinate the agents to run the tasks - npx nx-cloud start-ci-run # Run any commands you want here - nx affected --target=lint & nx affected --target=test & nx affected --target=build # Stop any run away agents - npx nx-cloud stop-all-agents

The agent job execution flow is very simple:

# Wait for tasks to execute - npx nx-cloud start-agent

The main job looks more or less the same way as if you haven't used any distribution. The only thing you need to do is to invoke npx nx-cloud start-ci-run at the beginning and optionally invoke npx nx-cloud stop-all-agents at the end.

The agent jobs run long-running start-agent processes that execute all the tasks associated with a given CI run. The only thing you need to do to set them up is to invoke npx nx-cloud start-agent. This process will keep running until Nx Cloud tells it to terminate.

Note it's important that the main job and the agent jobs have the same environment and the same source code. They start around the same time. And, once the main job completes, all the agents will be stopped.

It's also important to note that an Nx Cloud agent isn't a machine but rather a long-running process that runs on a machine. I.e., Nx Cloud doesn't manage your agents--you need to do it in your CI config (check out CI examples below).

Nx Cloud is an orchestrator. The main job tells Nx Cloud what you want to run, and Nx Cloud will distribute those tasks across the agents. Nx Cloud will automatically move files from one agent to another, from the agents to the main job.

The end result is that when say nx affected --target=build completes on the main job, all the file artifacts created on agents are copied over to the main job, as if the main job had built everything locally.

Running Things in Parallel

--parallel is propagated to the agents. E.g., npx nx affected --target=build --parallel=3 --dte tells Nx Cloud to run up to 3 build targets in parallel on each agent. So if you have say 10 agents, you will run up to 30 builds in parallel across all of them.

You also want to run as many commands in parallel as you can. For instance,

- nx affected --target=build - nx affected --target=test - nx affected --target=lint

is worse than

- nx affected --target=build & nx affected --target=test & nx affected --target=lint

The latter is going to schedule all the three commands at the same time, so if an agent cannot find anything to build, it will start running tests and lints. The result is better agent utilization and shorter CI time.

CI/CD Examples

The examples below show how to set up CI using Nx and Nx Cloud using distributed task execution and distributed caching.

Every organization manages their CI/CD pipelines differently, so the examples don't cover org-specific aspects of CI/CD (e.g., deployment). They mainly focus on configuring Nx correctly.

Read the guides for more information on how to configure them in CI.

Note that only cacheable operations can be distributed because they have to be replayed on the main job.

Illustrated Guide

For more details about how distributed task execution works, check out the illustrated guide by Nrwlian Nicole Oliver.

how does distributed task execution work in Nx Cloud?

Relevant Repositories and Examples

Concepts

Recipes

Reference

See Also