Models

Family-agnostic interface

create_pretrained is the symbol-dispatched entry point for loading released weights. It returns the model and a closure that loads the HuggingFace checkpoint into a (ps, st) pair you produce with Lux.setup:

model, load = create_pretrained(variant)
ps, st = Lux.setup(rng, model)
ps, st = load(ps, st)

The closure captures variant, in_chans, num_classes, and the HF / prefix kwargs at construction time, so the loader body no longer needs to introspect ps to recover what you already told it. create_model is the random-init counterpart: it returns the bare @compact model with no weights loaded. See Getting Started for the nested pattern with prefix.

Luximm.Models.create_pretrainedFunction
create_pretrained(variant; in_chans=3, num_classes=nothing,
                  revision="main", cache_dir=hf_hub_cache_dir(),
                  prefix=()) -> (model, load)

Family-agnostic pretrained-weight entry point, mirroring timm.create_model(..., pretrained=True). Returns the model and a closure that loads the released model.safetensors into a (ps, st) pair the caller produced with Lux.setup. The closure captures variant, in_chans, num_classes, and the HF / prefix kwargs at construction time, so calling it is the only place (ps, st) need to be threaded.

model, load = create_pretrained(:resnet50_a1_in1k)
ps, st = Lux.setup(Xoshiro(0), model)
ps, st = load(ps, st)

num_classes = nothing (the default) builds the head the released checkpoint ships with — default_num_classes(variant). Pass an explicit 0 for a features-only model, or any other Int to swap in a custom-width head (the released classifier is then skipped and the warning case fires).

For composition, build model separately and pass it into an outer @compact, capturing prefix = (:backbone,) so the closure writes into the right subtree:

backbone, load_backbone = create_pretrained(:resnet50_a1_in1k;
    num_classes = 0, prefix = (:backbone,))
outer = @compact(backbone = backbone,
    head = Dense(2048 => num_outputs)) do x
    head(backbone(x))
end
ps, st = Lux.setup(rng, outer)
ps, st = load_backbone(ps, st)
source
Luximm.Models.create_modelFunction
create_model(variant; kwargs...) -> model

Family-agnostic random-init model constructor, mirroring timm.create_model(..., pretrained=False). Dispatches on variant to the matching family constructor and returns the bare @compact model — no parameters, no state, no pretrained weights.

Use this when you want to train from scratch, or as a building block inside an outer @compact when composing a larger model. To load the released weights for a variant, use create_pretrained instead.

model = create_model(:resnet50_a1_in1k; num_classes = 1000)
ps, st = Lux.setup(rng, model)        # random init, ready for training

kwargs are forwarded to the family constructor (in_chans, num_classes).

source
Luximm.Models.default_num_classesFunction
default_num_classes(variant) -> Int

Head dimension the released checkpoint for variant was trained at. Returns 0 for encoder-only variants (DINOv3 ConvNeXt, ConvNeXtV2 fcmae pretrains).

source

Per-family namespaces

Each family exports its variant config struct and the <FAMILY>_VARIANTS registry dict. The remaining family internals (per-family constructors, weight mappings, state mappings) live in Luximm.Models.* for callers who need to escape the create_pretrained / create_model front door.

ResNet

Luximm.Models.ResNetVariantType
ResNetVariant

Architectural config for a classic timm ResNet variant.

Fields:

  • name: lookup key (e.g. :resnet50_a1_in1k).
  • block: residual block type, either :basic (used by r18/r34) or :bottleneck (used by r50/r101/r152).
  • layers: per-stage block count (d1, d2, d3, d4).
  • planes: base channel widths per stage (64, 128, 256, 512). Multiplied by 4 inside :bottleneck stages to give the actual output channel count.
  • num_features: backbone output channels (planes[end] for :basic, planes[end] * 4 for :bottleneck).
  • hf_repo: HuggingFace repo containing model.safetensors.
  • default_num_classes: head dimension the released weights ship with.
  • default_input_size: native training resolution (224 for every registered variant). Informational only: the model is fully convolutional and accepts any size.
source
Luximm.Models.RESNET_VARIANTSConstant
RESNET_VARIANTS :: Dict{Symbol, ResNetVariant}

Lookup table for classic ResNet variants currently ported from timm. Keys are the timm model names with dots rewritten as underscores.

source

Registered variants

Variantnum_classesnum_featuresinput size
:resnet101_a1_in1k10002048224
:resnet152_a1_in1k10002048224
:resnet18_a1_in1k1000512224
:resnet34_a1_in1k1000512224
:resnet50_a1_in1k10002048224

BiT ResNetV2

Luximm.Models.BiTVariantType
BiTVariant

Architectural config for a single BiT ResNetV2 variant.

Fields:

  • name: lookup key (e.g. :resnetv2_50x1_bit_goog_in21k).
  • layers: per-stage depth tuple (3,4,6,3) for r50, (3,4,23,3) for r101, (3,8,36,3) for r152.
  • width_factor: integer width multiplier from the timm name suffix (x1, x2, x3, x4).
  • stem_chs: stem output channels (64 * width_factor).
  • stage_chs: per-stage output channel tuple (base widths (256,512,1024,2048) scaled by width_factor).
  • num_features: backbone output channels (stage_chs[end]).
  • hf_repo: HuggingFace repo containing model.safetensors.
  • default_num_classes: head dimension the released weights were trained with (21843 for goog_in21k, 1000 for the in1k tags).
  • default_input_size: native training resolution (224 for most tags, 384 for the _384 teacher variant). The model itself is fully convolutional and accepts any input size; this is just what the released weights were tuned at.
source
Luximm.Models.BIT_VARIANTSConstant
BIT_VARIANTS :: Dict{Symbol, BiTVariant}

Lookup table for the BiT variants this package currently ports. Keys mirror the timm model name with the dot rewritten as an underscore (the dot is reserved in Julia identifiers); the full timm name with the dot lives at BIT_VARIANTS[key].hf_repo.

source

Registered variants

Variantnum_classesnum_featuresinput size
:resnetv2_101x1_bit_goog_in21k218432048224
:resnetv2_101x1_bit_goog_in21k_ft_in1k10002048224
:resnetv2_101x3_bit_goog_in21k218436144224
:resnetv2_101x3_bit_goog_in21k_ft_in1k10006144224
:resnetv2_152x2_bit_goog_in21k218434096224
:resnetv2_152x2_bit_goog_in21k_ft_in1k10004096224
:resnetv2_152x2_bit_goog_teacher_in21k_ft_in1k10004096224
:resnetv2_152x2_bit_goog_teacher_in21k_ft_in1k_38410004096384
:resnetv2_152x4_bit_goog_in21k218438192224
:resnetv2_152x4_bit_goog_in21k_ft_in1k10008192224
:resnetv2_50x1_bit_goog_distilled_in1k10002048224
:resnetv2_50x1_bit_goog_in21k218432048224
:resnetv2_50x1_bit_goog_in21k_ft_in1k10002048224
:resnetv2_50x3_bit_goog_in21k218436144224
:resnetv2_50x3_bit_goog_in21k_ft_in1k10006144224

ConvNeXt

Luximm.Models.ConvNeXtVariantType
ConvNeXtVariant

Architectural config for a single ConvNeXt v1 variant.

Fields:

  • name: lookup key (e.g. :convnext_tiny_dinov3_lvd1689m).
  • depths: per-stage block count, (d1, d2, d3, d4).
  • dims: per-stage channel widths, (c1, c2, c3, c4). c1 is also the stem output channels. c4 is num_features.
  • hf_repo: HuggingFace repo containing model.safetensors.
  • default_num_classes: head dimension the released weights ship with. 0 for the DINO encoders (no usable head).
  • default_input_size: native training resolution (224, 384, …) for the released checkpoint. Informational only: the model is fully convolutional and accepts any size, so this is not enforced.
  • ls_init: LayerScale init value (gamma parameter in timm). All v1 variants released so far use 1e-6; kept as a field in case future ports need a different value.
source
Luximm.Models.CONVNEXT_VARIANTSConstant
CONVNEXT_VARIANTS :: Dict{Symbol, ConvNeXtVariant}

Lookup table for the ConvNeXt v1 variants this package ports: the DINOv3 encoders and the Facebook AI checkpoints from the original ConvNeXt paper. Additional convnext_* lineages (.in12k_*, .clip_*) can be registered without touching the constructor or mapping code.

source
DINOv3 weights are not Apache 2.0

The four :convnext_*_dinov3_lvd1689m encoders are released by Meta under the DINOv3 License, which imposes obligations on outputs derived from the weights that differ from a standard permissive open-source license. Read the license before using the weights for any downstream task. This applies only to the weights; the Julia code in this package is Apache 2.0. The Facebook AI .fb_* checkpoints carry the upstream Apache 2.0 license and are unaffected.

Registered variants

Variantnum_classesnum_featuresinput size
:convnext_base_dinov3_lvd1689m01024224
:convnext_base_fb_in1k10001024224
:convnext_base_fb_in22k218411024224
:convnext_base_fb_in22k_ft_in1k10001024224
:convnext_base_fb_in22k_ft_in1k_38410001024384
:convnext_large_dinov3_lvd1689m01536224
:convnext_large_fb_in1k10001536224
:convnext_large_fb_in22k218411536224
:convnext_large_fb_in22k_ft_in1k10001536224
:convnext_large_fb_in22k_ft_in1k_38410001536384
:convnext_small_dinov3_lvd1689m0768224
:convnext_small_fb_in1k1000768224
:convnext_small_fb_in22k21841768224
:convnext_small_fb_in22k_ft_in1k1000768224
:convnext_small_fb_in22k_ft_in1k_3841000768384
:convnext_tiny_dinov3_lvd1689m0768224
:convnext_tiny_fb_in1k1000768224
:convnext_tiny_fb_in22k21841768224
:convnext_tiny_fb_in22k_ft_in1k1000768224
:convnext_tiny_fb_in22k_ft_in1k_3841000768384
:convnext_xlarge_fb_in22k218412048224
:convnext_xlarge_fb_in22k_ft_in1k10002048224
:convnext_xlarge_fb_in22k_ft_in1k_38410002048384

ConvNeXt V2

Luximm.Models.ConvNeXtV2VariantType
ConvNeXtV2Variant

Architectural config for a single ConvNeXtV2 variant.

Fields:

  • name: lookup key (e.g. :convnextv2_atto_fcmae).
  • depths: per-stage block count, (d1, d2, d3, d4).
  • dims: per-stage channel widths, (c1, c2, c3, c4). c1 is also the stem output channels. c4 is num_features.
  • hf_repo: HuggingFace repo containing model.safetensors.
  • default_num_classes: head dimension the released weights ship with. 0 for the bare .fcmae encoders, 1000 for the ImageNet-1K and ImageNet-22k-then-1K fine-tunes.
  • default_input_size: native training resolution (224, 384, or 512) for the released checkpoint. Informational only: the model is fully convolutional and accepts any size, so this is not enforced.
source
Luximm.Models.CONVNEXTV2_VARIANTSConstant
CONVNEXTV2_VARIANTS :: Dict{Symbol, ConvNeXtV2Variant}

Lookup table for the ConvNeXtV2 variants this package ports. The .fcmae rows are the bare encoders; all other rows ship a 1000-class ImageNet head. convnextv2_small is not included because timm only registers it as .untrained (no pretrained weights).

source
ConvNeXtV2 weights are non-commercial

Every ConvNeXtV2 checkpoint is released by Meta under Creative Commons Attribution-NonCommercial 4.0. Commercial use of these weights is not permitted. This applies to every row in the variant table below and is independent of Luximm.jl's own Apache 2.0 code license. If commercial use matters, BiT (Apache 2.0) or the ConvNeXt v1 .fb_* checkpoints (Apache 2.0) are the alternatives.

Registered variants

Variantnum_classesnum_featuresinput size
:convnextv2_atto_fcmae0320224
:convnextv2_atto_fcmae_ft_in1k1000320224
:convnextv2_base_fcmae01024224
:convnextv2_base_fcmae_ft_in1k10001024224
:convnextv2_base_fcmae_ft_in22k_in1k10001024224
:convnextv2_base_fcmae_ft_in22k_in1k_38410001024384
:convnextv2_femto_fcmae0384224
:convnextv2_femto_fcmae_ft_in1k1000384224
:convnextv2_huge_fcmae02816224
:convnextv2_huge_fcmae_ft_in1k10002816224
:convnextv2_huge_fcmae_ft_in22k_in1k_38410002816384
:convnextv2_huge_fcmae_ft_in22k_in1k_51210002816512
:convnextv2_large_fcmae01536224
:convnextv2_large_fcmae_ft_in1k10001536224
:convnextv2_large_fcmae_ft_in22k_in1k10001536224
:convnextv2_large_fcmae_ft_in22k_in1k_38410001536384
:convnextv2_nano_fcmae0640224
:convnextv2_nano_fcmae_ft_in1k1000640224
:convnextv2_nano_fcmae_ft_in22k_in1k1000640224
:convnextv2_nano_fcmae_ft_in22k_in1k_3841000640384
:convnextv2_pico_fcmae0512224
:convnextv2_pico_fcmae_ft_in1k1000512224
:convnextv2_tiny_fcmae0768224
:convnextv2_tiny_fcmae_ft_in1k1000768224
:convnextv2_tiny_fcmae_ft_in22k_in1k1000768224
:convnextv2_tiny_fcmae_ft_in22k_in1k_3841000768384