Hermetic and portable CI in a flake

A full-Nix CI setup for a Dart/Flutter monorepo where the GitHub Actions workflow YAML is itself generated from Nix, the checks run with no network access in the sandbox, and `nix flake check` is the entire CI surface — the same primitive locally and on the runner.

Why
What runs
The check derivation is the interesting part
The workflow itself is also Nix
What it costs
Where to look

Why #

Almost every project I've worked on has a .github/workflows/ci.yml that does roughly the same thing. Install the SDK with a setup action written and maintained by someone else, run pub get against pub.dev on a fresh runner, run the tests, push the result. The green check at the bottom of the PR is technically a proof that the tests passed — in a slightly different toolchain than the one on my laptop, against a slightly different snapshot of the dependencies, on a machine that trusted three external systems to behave the same way they did the last time.

For a small library that's fine. For a small library you want to publish to pub.dev and ask other people to depend on, it is also fine, in the way that a lot of things are fine until they aren't.

I just shipped caffeine 3.0 and redid the CI as strictly Nix. Not Nix wrapping setup-dart, but Nix end to end. The dev shell, the test runner, the lockfile handling, the platform deps, the workflow YAML itself — all materialized by the flake. The GitHub Actions runner does the smallest possible job: install Nix, evaluate a matrix from the flake, fan out one runner per check, go home.

What runs #

flake show, trimmed:

github:purplenoodlesoop/caffeine
├───apps.x86_64-linux.sync-ci
├───checks.x86_64-linux
│   ├───caffeine
│   ├───flutter-caffeine
│   └───ci-up-to-date
├───devShells.x86_64-linux.default
├───githubActions
└───packages.x86_64-linux.ci-yaml

Three checks (one per package, plus a drift check on the workflow file itself), one dev shell, one package that is the rendered workflow YAML, one app that copies the rendered YAML into the working tree. githubActions is the matrix the workflow consumes — generated by nix-community/nix-github-actions from the checks attribute set, so the matrix and the actual list of checks share one source.

The workflow definition lives in nix/ci.nix:

let
  checkoutStep = { name = "Checkout"; uses = "actions/checkout@v4"; };
  installNixStep = {
    name = "Install Nix";
    uses = "DeterminateSystems/nix-installer-action@v22";
    "with".determinate = false;
  };
  magicCacheStep = {
    name = "Magic Nix Cache";
    uses = "DeterminateSystems/magic-nix-cache-action@v13";
  };
in
{
  name = "ci";
  on = {
    push.branches = [ "master" ];
    pull_request = { };
    workflow_dispatch = { };
  };
  jobs = {
    matrix = {
      runs-on = "ubuntu-latest";
      outputs.matrix = "\${{ steps.set-matrix.outputs.matrix }}";
      steps = [
        checkoutStep
        installNixStep
        {
          id = "set-matrix";
          name = "Generate matrix";
          run = ''
            matrix="$(nix eval --json '.#githubActions.matrix')"
            echo "matrix=$matrix" >> "$GITHUB_OUTPUT"
          '';
        }
      ];
    };

    check = {
      name = "\${{ matrix.name }}";
      needs = "matrix";
      runs-on = "\${{ matrix.os }}";
      strategy = {
        fail-fast = false;
        matrix = "\${{ fromJSON(needs.matrix.outputs.matrix) }}";
      };
      steps = [
        checkoutStep
        installNixStep
        magicCacheStep
        { name = "Build check"; run = "nix build -L '.#\${{ matrix.attr }}'"; }
      ];
    };
  };
}

A Nix attrset. Triggers, concurrency, jobs, steps — all just fields. The matrix job evaluates the flake and emits one entry per check. The check job fans out — caffeine, flutter-caffeine, ci-up-to-date each on its own runner, each as its own status check in the PR. Adding a new check to flake.nix adds a status line to the next PR without touching this file.

The runner has zero direct knowledge of dart, flutter, pub, package_config.json or the workspace layout. It does nix build -L .#<attr> and reports the exit code.

The check derivation is the interesting part #

checks.<system>.caffeine is a buildDartApplication derivation whose checkPhase runs the tests inside the Nix sandbox. The sandbox has no network. Everything has to be in the closure already.

Three things had to be true at the same time.

First, pub2nix needs the workspace lockfile as a Nix value, and it is YAML on disk. Convert it inline:

runCommand "${pname}-pubspec.lock.json"
  { nativeBuildInputs = [ yq-go ]; }
  "yq -o=json . ${src}/pubspec.lock > $out"

This is import-from-derivation, so nix flake check pauses briefly to materialize the JSON before evaluating the rest. Slightly slower, never wrong, no committed pubspec.lock.json to keep in sync with the YAML one.

Second, dart test calls pub.Entrypoint.ensureUpToDate at startup. That function compares the on-disk lockfile against the generated package_config.json and reaches for pub.dev if it doesn't like what it sees. There is no --no-pub flag on dart test. The trick is a tiny bash helper that dartConfigHook already installs into the build shell:

checkPhase = ''
  cp -rL . "$TMPDIR/src"
  chmod -R u+w "$TMPDIR/src"
  cp -rL "$TMPDIR/src/.dart_tool" "$TMPDIR/src/packages/caffeine/"
  cd "$TMPDIR/src/packages/caffeine"
  HOME=$TMPDIR packageRun test --reporter expanded
'';

packageRun test resolves to dart --packages=.dart_tool/package_config.json <test-pkg>/bin/test.dart. No pub on the call path, no ensureUpToDate, no network probe. Just the Dart VM running the test runner with an explicit package config.

Third, the workspace has to be invisible at test time. Otherwise dart test walks into the Flutter example as a sibling workspace member and demands the Flutter SDK. After dartConfigHook has built package_config.json (which needs the workspace declaration so workspace members get added to the config), strip both keys:

sed -i '/^workspace:$/,$d' pubspec.yaml
sed -i '/^resolution: workspace$/d' packages/caffeine/pubspec.yaml

Order matters. Strip in postPatch and workspacePackageConfigScript sees no workspace and never adds caffeine to the config (package:caffeine resolves to nothing, tests fail to compile). Strip in checkPhase and you keep the config but lose the workspace at runtime. The fix is the second option.

The workflow itself is also Nix #

The .github/workflows/ci.yml file is a generated artifact:

ci-yaml-body = (pkgs.formats.yaml { }).generate "ci-body.yml"
  (import ./nix/ci.nix);

ci-yaml = pkgs.runCommand "ci.yml" { } ''
  cat > $out <<'HEADER'
  # AUTO-GENERATED from nix/ci.nix — DO NOT EDIT BY HAND.
  # Regenerate with: nix run .#sync-ci
  HEADER
  cat ${ci-yaml-body} >> $out
'';

nix/ci.nix is the workflow as a Nix attrset. Triggers, concurrency group, steps — all name = "..."; on = { ... };-style fields. pkgs.formats.yaml renders it. The wrapping runCommand prepends the do-not-edit header so the rendered file is self-explaining about how it got there.

apps.sync-ci is a writeShellApplication that copies the rendered YAML into .github/workflows/ci.yml at the working tree's root. It has to be a real committed file — GitHub Actions runners can't follow /nix/store/... paths, so we generate and commit, not symlink.

Drift is caught by another check. checks.<sys>.ci-up-to-date is a runCommand that diffs the committed file against the rendered one:

ci-up-to-date = pkgs.runCommand "ci-up-to-date" { } ''
  if ! diff -u ${./.github/workflows/ci.yml} ${ci-yaml}; then
    echo "❌ .github/workflows/ci.yml is out of sync with nix/ci.nix."
    echo "   Run: nix run .#sync-ci"
    exit 1
  fi
  touch $out
'';

If somebody hand-edits the workflow, the next CI run fails with a message telling them to regenerate. If somebody edits nix/ci.nix without running sync-ci, same failure. There is no third option, which is the whole point.

What it costs #

Honest numbers. Cold run, empty cache: ~10–12 minutes. About 8 of those go to the post-job step uploading freshly realized /nix/store paths to GitHub Actions Cache (the Flutter SDK alone, even though we never flutter build, contributes ~150 MB of pre-bundled Linux engine artifacts — buildFlutterApplication's Linux variant pulls in wrapGAppsHook3 and friends regardless of whether you actually build anything for Linux). Warm run, no dependency changes: each check finishes in ~1–2 minutes on its own runner, ~3 minutes wall time including the matrix-evaluation job.

For comparison, the same project with subosito/flutter-action and dart test runs in ~1–2 minutes cold and ~45 seconds warm. Strictly Nix is roughly 5× slower cold and 3× slower warm. That cost is real and not going away — sandboxed builds, plus each matrix runner installing Nix from scratch, plus magic-nix-cache's upload phase, are the price.

What you get for the price is small but specific. The same nix build .#<attr> invocation runs identically on my Mac and on ubuntu-latest. Setting up the project on a fresh machine is git clone && nix develop and the toolchain is at the version pinned in flake.lock, not at whatever setup-dart@v1 resolves to today. When something breaks, the breakage is reproducible — which is the only kind of breakage worth fixing.

The value of doing CI this way only shows up the day something does break. If your hermetic test never tells you anything you didn't already know, it isn't hermetic. The whole pipeline has to be tight enough that the next surprise it produces is reproducible.

Where to look #

flake.nix — top-level wiring, dev shell, checks, packages, apps, githubActions matrix
nix/mk-check.nix — shared check builder used by both packages
nix/ci.nix — workflow as a Nix attrset
.github/workflows/ci.yml — the generated file (do not edit)

yakov.codes