Testing
Luximm's correctness story rests on parity tests: for every registered variant, the Julia forward must match timm's forward on the same input and weights. The bar is two-tier:
- Logits are checked at an absolute max-abs-diff under
TOL = 1f-3. The classifier head is shallow and tightly bounded, so an absolute ceiling is the meaningful end-to-end guarantee. - Features (
forward_featuresand thein_chans=1companion) are checked at a relative bar,max-abs-diff / max-abs(timm ref)underFEATURES_RTOL = 1f-3. Deep backbones accumulate FP32 rounding through dozens of stages, which inflates raw pre-norm feature diffs by a factor that scales with depth and channel width, even when downstream logits stay tight. A relative bar keeps the check scale-free across tiny through huge variants.
This page covers the layout of the test suite, how to scope a run to a single variant, how to dump the HDF5 fixture a parity test consumes, and how production CI runs the sweep.
Test layout
Everything lives under test/:
test/
├── runtests.jl # entry point
├── _filter.jl # env-var-driven family/variant filtering
├── parity/ # Python sidecars that produce HDF5 fixtures
│ ├── _dump_common.py
│ ├── dump_resnet_io.py
│ ├── dump_resnetv2_bit_io.py
│ ├── dump_convnext_io.py
│ └── dump_convnextv2_io.py
├── test_resnet.jl # ResNet parity sweep
├── test_bit_resnet.jl # BiT ResNetV2 parity sweep
├── test_convnext.jl # ConvNeXt v1 parity sweep
├── test_convnextv2.jl # ConvNeXtV2 parity sweep
├── test_init.jl # Init-recipe parity (random-init)
├── test_hf_download.jl # Raw HF download path
└── test_hf_hub_download.jl # HF Hub cache layoutruntests.jl consults _filter.jl to decide which family files to include. Each family file iterates over a tuple of variant keys and either runs a parity test against an existing HDF5 fixture or skips the variant if its fixture is missing under data/parity/.
The three "infra" test files (test_init.jl, test_hf_download.jl, test_hf_hub_download.jl) cover the cross-cutting concerns: init recipes, raw HuggingFace downloads, and the Hub cache layout. They run as the infra family.
Scoping a run with environment variables
A full parity sweep downloads every released checkpoint and runs a forward through every variant. That is the right thing to do on the CI server, but on a developer machine it is overkill. Two environment variables narrow what runs without editing files:
JIMM_TEST_VARIANTS: comma-separated variant keys (e.g.convnextv2_atto_fcmae). Setting this alone also restricts the active families to whichever ones contain the listed variants and drops theinfrafamily, so a single variant key is enough to scope a run.JIMM_TEST_FAMILIES: comma-separated list of families. Recognized values areinfra,bit,resnet,convnext,convnextv2. When set, this is authoritative and overrides the family inference fromJIMM_TEST_VARIANTS. Unset andJIMM_TEST_VARIANTSalso unset means every family runs.
# Just the ConvNeXtV2 atto fcmae parity tests, nothing else:
JIMM_TEST_VARIANTS=convnextv2_atto_fcmae \
julia --project -e 'using Pkg; Pkg.test()'
# Include the infra checks alongside a single backbone variant:
JIMM_TEST_FAMILIES=infra,convnextv2 \
JIMM_TEST_VARIANTS=convnextv2_atto_fcmae \
julia --project -e 'using Pkg; Pkg.test()'Parity tests also skip when their HDF5 fixture is missing under data/parity/, so a contributor can dump one variant's fixture (and optionally its _in1c companion) and run just that one without touching the test code.
Parity fixtures
A parity fixture is an HDF5 file produced by one of the Python sidecars under test/parity/. It contains:
/input: deterministictorch.randninput in PyTorch NCHW layout./state_dict/<key>: every PyTorch parameter, keyed by itsstate_dictname./output/features: the result ofmodel.forward_features(input)./output/logits(when the variant ships a trained head): the result ofmodel.forward(input).
Fixtures live in data/parity/ and follow the naming convention <variant_key>_io.h5 (and <variant_key>_in1c_io.h5 for single-channel variants). The directory is gitignored; fixtures are not redistributed.
The Julia side consumes fixtures via Luximm.Interop.read_parity, which returns a NamedTuple (input, state_dict, output) with all arrays already axis-reversed from PyTorch NCHW into Lux's WHCN layout. The mapping function for the family then routes each state-dict key into the corresponding Lux parameter path via apply_state_dict.
Dumping a fixture
uv run python test/parity/dump_<family>_io.py \
--variant <timm_name> \
--out data/parity/<variant_key>_io.h5<timm_name> is the dot-separated timm model name (e.g. convnextv2_atto.fcmae). <variant_key> is the Julia symbol form with the dot rewritten as an underscore. Pass --in-chans 1 to produce the single-channel companion fixture; the output filename suffix changes from _io to _in1c_io.
The first dump for a family materializes the Python sidecar environment (PyTorch plus timm plus the small HDF5 helpers). uv sync against the repo's pyproject.toml is the supported provisioning path.
The scripts/test_variant.sh wrapper
For the common case (one variant, dump-if-missing then test), scripts/test_variant.sh chains the fixture dump and the Julia invocation:
# Resolve family, dump fixture if absent, run only this variant:
scripts/test_variant.sh convnextv2_atto_fcmae
# Classic ResNet18:
scripts/test_variant.sh resnet18_a1_in1k
# Single-channel parity test (dumps the _in1c fixture):
scripts/test_variant.sh convnextv2_atto_fcmae --in-chans 1
# Force a fresh fixture dump even if one already exists:
scripts/test_variant.sh convnextv2_atto_fcmae --forceThe script resolves the family from the variant prefix, calls the appropriate Python sidecar under test/parity/ via uv run if the HDF5 fixture is missing, then runs the Julia test suite with JIMM_TEST_VARIANTS=<variant> set. Requires both uv and julia on PATH.
CI
Production CI runs on a self-hosted Linux VM via a Julia TUI driver named jimm-ci (under ci/JimmCI/). The setup is unusual on purpose: the parity test suite needs hundreds of gigabytes of timm reference weights and per-variant HDF5 fixtures that each take a few minutes of CPU plus a PyTorch environment to produce. Hosted CI runners would re-download and re-generate that material on every run; a self-hosted machine with a persistent state directory hits the cache instead.
See ci/README.md for the full deployment story (App registration, secrets layout, the TUI keybindings, and how PR runs map paths to test families). The short version, from a contributor's perspective:
- PRs are reviewed by a maintainer and approved by selecting their row in the
jimm-ciTUI on the VM. Fork PRs are filtered out client-side and never produce a Check Run. - The runner consults
ci/JimmCI/src/PathFilter.jlto map changed paths to families. If you add a new family or rename a shared module, that file must be updated; otherwise CI silently skips tests for the new code. - Per-PR Check Runs come back through the standard GitHub Checks API. Look at the
jimm-ci / <family>checks on the PR.
The separate documentation workflow under .github/workflows/docs.yml is a hosted GitHub Actions job. It builds and deploys this documentation site without needing parity weights or fixtures, so the regular runners are sufficient for that path.