Development & Testing

How carrick is built, tested, and validated. If you want to contribute or understand the engineering methodology, start here.

Prerequisites

You need macOS 14 (Sonoma) or later on Apple Silicon with Hypervisor.framework support (sysctl kern.hv_support should return 1). You also need:

Building

Carrick uses Apple's Hypervisor.framework, which requires the com.apple.security.hypervisor entitlement. cargo build strips the codesignature on macOS, so a bare build produces a binary that fails every run with HV_DENIED (0xfae94007). Always build through the signed build script.

# Build + codesign the release binary (the only runnable build)
$ just build

# Build + sign, then run immediately
$ just run run ubuntu:24.04 /bin/echo hi

# Fast unsigned debug build — compile-checking only, cannot run a guest
$ just check

The signed binary lives at target/release/carrick. The build script (scripts/build-signed.sh) handles shared CARGO_TARGET_DIR setups by atomically copying and signing per-worktree, so concurrent worktrees never clobber each other's signature.

The no-panic lint gate

The supervisor must never crash on guest input. The following panicking constructs are denied crate-wide via [workspace.lints.clippy] in Cargo.toml:

# Cargo.toml — workspace lint config
[workspace.lints.clippy]
unwrap_used    = "deny"
expect_used    = "deny"
panic          = "deny"
todo           = "deny"
unimplemented  = "deny"

Test code is exempt via clippy.toml (allow-unwrap-in-tests = true). A handful of audited, provably-infallible sites carry a targeted #[allow(...)] with an // INVARIANT: comment. Release builds also run with overflow-checks = true, so arithmetic on guest-controlled integers traps to a contained abort rather than wrapping silently.

# Run the lint gate
$ just clippy

# Or directly
$ cargo clippy --workspace --all-targets -- -D warnings

Testing layers

Carrick has three distinct testing layers, each catching different classes of bugs:

LayerCommandWhat it testsNeeds HVF/Docker?
Unit / integration tests just test Pure logic: ABI definitions, flag translation, struct layouts, syscall argument parsing No
Conformance probes just conformance Syscall behavior diffed against a Docker linux/arm64 oracle — catches ABI fidelity bugs Yes (both)
Integration suites Per-runtime scripts Go, CPython, libuv, Node.js full test suites under carrick vs Docker Yes (both)

Unit and integration tests

# Host-only tests — no HVF, no Docker, runs anywhere
$ just test

# Or target a specific crate
$ cargo test -p carrick-runtime --lib

These test the translation logic in isolation: struct packing, flag constant mappings, signal number translation, error code conversion, etc. They run on any machine and are fast.

Conformance probes — the core methodology

This is the heart of carrick's quality assurance. The approach is:

  1. Discover a gap using LTP, a runtime test suite, or a real workload failure.
  2. Write a probe — a standalone Linux ELF binary that exercises the specific invariant.
  3. Diff against Docker — run the probe under both carrick and Docker linux/arm64 and compare stdout byte-for-byte.
  4. Fix the gap in the runtime.
  5. Gate — the probe now permanently prevents regression.

The probe is the deliverable; the LTP match count is just confirmation. Today there are 289 owned probes (298 binaries) covering 502 curated LTP tests at 100%.

Probe structure

Probes live in conformance-probes/src/bin/. Each is a standalone Rust binary cross-compiled to aarch64-unknown-linux-musl (statically linked, no libc dependency on the host). Output must be deterministic — booleans and equalities only, never timestamps, PIDs, or addresses.

// conformance-probes/src/bin/example.rs
//! Verifies that dup(2) returns the lowest available fd.
//! Stands in for LTP pipe_close_stdout_read_stdin.

use conformance_probes::report;

fn main() {
    unsafe {
        // close stdin, then dup — must get fd 0 back
        libc::close(0);
        let fd = libc::dup(3);
        report!(dup_returns_lowest_fd = (fd == 0));
    }
}

Building probes

Probes are built inside a Docker container — no host cross-compiler needed:

$ scripts/build-probes.sh
probes built: conformance-probes/target/aarch64-unknown-linux-musl/release/

This runs cargo build --release --target aarch64-unknown-linux-musl inside a rust:alpine linux/arm64 container, producing static ELF binaries.

Running a single probe

# Diff one probe against Docker — the FAITHFUL path (matches the CI gate)
$ scripts/run-probe.sh signals
MATCH signals
  sa_install_restore=true
  sa_bad_addr_efault=true
  ...

run-probe.sh base64-encodes the probe binary, pipes it into carrick run <image> /bin/sh -c 'base64 -d > /tmp/p && chmod +x /tmp/p && /tmp/p', and does the same under Docker. It diffs stdout and prints MATCH or DIFF. This is the same path the CI gate uses — do not verify probes via carrick run-elf alone, which uses a different (lighter, single-threaded) execution path.

Running the full gate

# Run all probes — builds the signed binary first
$ just conformance

# Or directly
$ cargo test -p carrick-cli --test conformance -- --nocapture

Scaffolding a new probe

$ scripts/new-probe.sh myprobe
created conformance-probes/src/bin/myprobe.rs
next: edit, then scripts/build-probes.sh && docker-run to diff

This creates a skeleton with the standard imports and doc comment structure. The workflow after editing is:

# 1. Build probes
$ scripts/build-probes.sh

# 2. Verify against Docker oracle
$ scripts/run-probe.sh myprobe

# 3. If DIFF → fix the runtime, rebuild carrick, re-run
$ just build && scripts/run-probe.sh myprobe

# 4. Once MATCH → run the full gate
$ just conformance

DTrace tracing

Carrick wires static USDT probes at the syscall translation boundary. When debugging a guest, carrick trace exposes the real Linux→Darwin call flow:

$ sudo carrick trace run alpine:latest /bin/echo hi
[carrick] VM created, vCPU at EL0
[svc #0] sys_write(1, 0x4002c000, 3) → Darwin write(1, "hi\n", 3) = 3
[svc #0] sys_exit_group(0)
[carrick] Process exited, status=0

The scripts/ directory contains dozens of specialized DTrace scripts for deep subsystem tracing: trace-futex.d, trace-mmap.d, trace-fork-quiesce.d, trace-node-worker-events.d, etc. These are invaluable when reducing a new workload failure to its root cause.

Compatibility report

Before filing an issue about a binary that doesn't work, run the compatibility report to identify which syscalls are missing:

$ carrick compat-report -- /usr/bin/find / -name '*.so'

This instruments the guest via USDT probes and aggregates unhandled or partially-implemented syscalls and /proc paths.

License policy

Carrick is dual-licensed as Apache-2.0 OR MIT. Dependencies are gated by cargo-deny via deny.toml, which allows only permissive licenses: MIT, Apache-2.0, BSD, ISC, Unicode-3.0, Zlib, Unlicense, 0BSD, BSL-1.0, and CDLA-Permissive-2.0.

Build performance

carrick-runtime is a single large crate (~65k lines). The workspace links 26 integration-test binaries plus the CLI. With macOS's default ld64, an incremental rebuild after a one-line runtime edit can spend significant time in the linker.

Do not switch to LLVM lld. lld's Mach-O port drops the __DATA,__dof_carrick section that the usdt crate reads for carrick trace probes. Verify with otool -l target/release/carrick | grep dof if you experiment with linker changes.

The profiles are tuned for the inner loop: dev uses split-debuginfo = "unpacked" to skip dsymutil packaging; dev-fast further drops type/variable DWARF to line-tables-only while keeping backtraces functional.

Quick reference

TaskCommand
Build + signjust build
Build + runjust run run ubuntu:24.04 /bin/echo hi
Compile check (no sign)just check
No-panic lint gatejust clippy
Format checkjust fmt-check
Unit testsjust test
Build conformance probesscripts/build-probes.sh
Run one probescripts/run-probe.sh <name>
Scaffold a new probescripts/new-probe.sh <name>
Full conformance gatejust conformance
Re-sign binaryjust sign
Live syscall tracesudo carrick trace run <image> <cmd>
Compatibility reportcarrick compat-report -- <cmd>