Models
Family-agnostic interface
create_pretrained is the symbol-dispatched entry point for loading released weights. It returns the model and a closure that loads the HuggingFace checkpoint into a (ps, st) pair you produce with Lux.setup:
model, load = create_pretrained(variant)
ps, st = Lux.setup(rng, model)
ps, st = load(ps, st)The closure captures variant, in_chans, num_classes, and the HF / prefix kwargs at construction time, so the loader body no longer needs to introspect ps to recover what you already told it. create_model is the random-init counterpart: it returns the bare @compact model with no weights loaded. See Getting Started for the nested pattern with prefix.
Luximm.Models.create_pretrained — Function
create_pretrained(variant; in_chans=3, num_classes=nothing,
revision="main", cache_dir=hf_hub_cache_dir(),
prefix=()) -> (model, load)Family-agnostic pretrained-weight entry point, mirroring timm.create_model(..., pretrained=True). Returns the model and a closure that loads the released model.safetensors into a (ps, st) pair the caller produced with Lux.setup. The closure captures variant, in_chans, num_classes, and the HF / prefix kwargs at construction time, so calling it is the only place (ps, st) need to be threaded.
model, load = create_pretrained(:resnet50_a1_in1k)
ps, st = Lux.setup(Xoshiro(0), model)
ps, st = load(ps, st)num_classes = nothing (the default) builds the head the released checkpoint ships with — default_num_classes(variant). Pass an explicit 0 for a features-only model, or any other Int to swap in a custom-width head (the released classifier is then skipped and the warning case fires).
For composition, build model separately and pass it into an outer @compact, capturing prefix = (:backbone,) so the closure writes into the right subtree:
backbone, load_backbone = create_pretrained(:resnet50_a1_in1k;
num_classes = 0, prefix = (:backbone,))
outer = @compact(backbone = backbone,
head = Dense(2048 => num_outputs)) do x
head(backbone(x))
end
ps, st = Lux.setup(rng, outer)
ps, st = load_backbone(ps, st)Luximm.Models.create_model — Function
create_model(variant; kwargs...) -> modelFamily-agnostic random-init model constructor, mirroring timm.create_model(..., pretrained=False). Dispatches on variant to the matching family constructor and returns the bare @compact model — no parameters, no state, no pretrained weights.
Use this when you want to train from scratch, or as a building block inside an outer @compact when composing a larger model. To load the released weights for a variant, use create_pretrained instead.
model = create_model(:resnet50_a1_in1k; num_classes = 1000)
ps, st = Lux.setup(rng, model) # random init, ready for trainingkwargs are forwarded to the family constructor (in_chans, num_classes).
Luximm.Models.default_num_classes — Function
default_num_classes(variant) -> IntHead dimension the released checkpoint for variant was trained at. Returns 0 for encoder-only variants (DINOv3 ConvNeXt, ConvNeXtV2 fcmae pretrains).
Per-family namespaces
Each family exports its variant config struct and the <FAMILY>_VARIANTS registry dict. The remaining family internals (per-family constructors, weight mappings, state mappings) live in Luximm.Models.* for callers who need to escape the create_pretrained / create_model front door.
ResNet
Luximm.Models.ResNetVariant — Type
ResNetVariantArchitectural config for a classic timm ResNet variant.
Fields:
name: lookup key (e.g.:resnet50_a1_in1k).block: residual block type, either:basic(used by r18/r34) or:bottleneck(used by r50/r101/r152).layers: per-stage block count(d1, d2, d3, d4).planes: base channel widths per stage(64, 128, 256, 512). Multiplied by 4 inside:bottleneckstages to give the actual output channel count.num_features: backbone output channels (planes[end]for:basic,planes[end] * 4for:bottleneck).hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with.default_input_size: native training resolution (224 for every registered variant). Informational only: the model is fully convolutional and accepts any size.
Luximm.Models.RESNET_VARIANTS — Constant
RESNET_VARIANTS :: Dict{Symbol, ResNetVariant}Lookup table for classic ResNet variants currently ported from timm. Keys are the timm model names with dots rewritten as underscores.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:resnet101_a1_in1k | 1000 | 2048 | 224 |
:resnet152_a1_in1k | 1000 | 2048 | 224 |
:resnet18_a1_in1k | 1000 | 512 | 224 |
:resnet34_a1_in1k | 1000 | 512 | 224 |
:resnet50_a1_in1k | 1000 | 2048 | 224 |
SE-ResNet
Luximm.Models.SEResNetVariant — Type
SEResNetVariantArchitectural config for a single SE-ResNet variant.
Fields:
name: lookup key (e.g.:seresnet50_a1_in1k).layers: per-stage block count(d1, d2, d3, d4).planes: base channel widths per stage(64, 128, 256, 512). Multiplied by 4 (the bottleneck expansion) to give the actual stage output channels.num_features: backbone output channels (planes[end] * 4 = 2048).hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with.default_input_size: native training resolution (224). Informational only.se_reduction: SE bottleneck reduction divisor (16 for every variant); the SE inner width isse_make_divisible(out_ch / se_reduction, 8).
Luximm.Models.SERESNET_VARIANTS — Constant
SERESNET_VARIANTS :: Dict{Symbol, SEResNetVariant}Lookup table for the SE-ResNet variants ported from timm. Keys are the timm model name with dots rewritten as underscores.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:seresnet50_a1_in1k | 1000 | 2048 | 224 |
BiT ResNetV2
Luximm.Models.BiTVariant — Type
BiTVariantArchitectural config for a single BiT ResNetV2 variant.
Fields:
name: lookup key (e.g.:resnetv2_50x1_bit_goog_in21k).layers: per-stage depth tuple (3,4,6,3) for r50, (3,4,23,3) for r101, (3,8,36,3) for r152.width_factor: integer width multiplier from the timm name suffix (x1,x2,x3,x4).stem_chs: stem output channels (64 * width_factor).stage_chs: per-stage output channel tuple (base widths(256,512,1024,2048)scaled bywidth_factor).num_features: backbone output channels (stage_chs[end]).hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights were trained with (21843 forgoog_in21k, 1000 for the in1k tags).default_input_size: native training resolution (224 for most tags, 384 for the_384teacher variant). The model itself is fully convolutional and accepts any input size; this is just what the released weights were tuned at.
Luximm.Models.BIT_VARIANTS — Constant
BIT_VARIANTS :: Dict{Symbol, BiTVariant}Lookup table for the BiT variants this package currently ports. Keys mirror the timm model name with the dot rewritten as an underscore (the dot is reserved in Julia identifiers); the full timm name with the dot lives at BIT_VARIANTS[key].hf_repo.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:resnetv2_101x1_bit_goog_in21k | 21843 | 2048 | 224 |
:resnetv2_101x1_bit_goog_in21k_ft_in1k | 1000 | 2048 | 224 |
:resnetv2_101x3_bit_goog_in21k | 21843 | 6144 | 224 |
:resnetv2_101x3_bit_goog_in21k_ft_in1k | 1000 | 6144 | 224 |
:resnetv2_152x2_bit_goog_in21k | 21843 | 4096 | 224 |
:resnetv2_152x2_bit_goog_in21k_ft_in1k | 1000 | 4096 | 224 |
:resnetv2_152x2_bit_goog_teacher_in21k_ft_in1k | 1000 | 4096 | 224 |
:resnetv2_152x2_bit_goog_teacher_in21k_ft_in1k_384 | 1000 | 4096 | 384 |
:resnetv2_152x4_bit_goog_in21k | 21843 | 8192 | 224 |
:resnetv2_152x4_bit_goog_in21k_ft_in1k | 1000 | 8192 | 224 |
:resnetv2_50x1_bit_goog_distilled_in1k | 1000 | 2048 | 224 |
:resnetv2_50x1_bit_goog_in21k | 21843 | 2048 | 224 |
:resnetv2_50x1_bit_goog_in21k_ft_in1k | 1000 | 2048 | 224 |
:resnetv2_50x3_bit_goog_in21k | 21843 | 6144 | 224 |
:resnetv2_50x3_bit_goog_in21k_ft_in1k | 1000 | 6144 | 224 |
ConvNeXt
Luximm.Models.ConvNeXtVariant — Type
ConvNeXtVariantArchitectural config for a single ConvNeXt v1 variant.
Fields:
name: lookup key (e.g.:convnext_tiny_dinov3_lvd1689m).depths: per-stage block count,(d1, d2, d3, d4).dims: per-stage channel widths,(c1, c2, c3, c4).c1is also the stem output channels.c4isnum_features.hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with.0for the DINO encoders (no usable head).default_input_size: native training resolution (224, 384, …) for the released checkpoint. Informational only: the model is fully convolutional and accepts any size, so this is not enforced.ls_init: LayerScale init value (gammaparameter in timm). All v1 variants released so far use1e-6; kept as a field in case future ports need a different value.
Luximm.Models.CONVNEXT_VARIANTS — Constant
CONVNEXT_VARIANTS :: Dict{Symbol, ConvNeXtVariant}Lookup table for the ConvNeXt v1 variants this package ports: the DINOv3 encoders and the Facebook AI checkpoints from the original ConvNeXt paper. Additional convnext_* lineages (.in12k_*, .clip_*) can be registered without touching the constructor or mapping code.
The four :convnext_*_dinov3_lvd1689m encoders are released by Meta under the DINOv3 License, which imposes obligations on outputs derived from the weights that differ from a standard permissive open-source license. Read the license before using the weights for any downstream task. This applies only to the weights; the Julia code in this package is Apache 2.0. The Facebook AI .fb_* checkpoints carry the upstream Apache 2.0 license and are unaffected.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:convnext_base_dinov3_lvd1689m | 0 | 1024 | 224 |
:convnext_base_fb_in1k | 1000 | 1024 | 224 |
:convnext_base_fb_in22k | 21841 | 1024 | 224 |
:convnext_base_fb_in22k_ft_in1k | 1000 | 1024 | 224 |
:convnext_base_fb_in22k_ft_in1k_384 | 1000 | 1024 | 384 |
:convnext_large_dinov3_lvd1689m | 0 | 1536 | 224 |
:convnext_large_fb_in1k | 1000 | 1536 | 224 |
:convnext_large_fb_in22k | 21841 | 1536 | 224 |
:convnext_large_fb_in22k_ft_in1k | 1000 | 1536 | 224 |
:convnext_large_fb_in22k_ft_in1k_384 | 1000 | 1536 | 384 |
:convnext_small_dinov3_lvd1689m | 0 | 768 | 224 |
:convnext_small_fb_in1k | 1000 | 768 | 224 |
:convnext_small_fb_in22k | 21841 | 768 | 224 |
:convnext_small_fb_in22k_ft_in1k | 1000 | 768 | 224 |
:convnext_small_fb_in22k_ft_in1k_384 | 1000 | 768 | 384 |
:convnext_tiny_dinov3_lvd1689m | 0 | 768 | 224 |
:convnext_tiny_fb_in1k | 1000 | 768 | 224 |
:convnext_tiny_fb_in22k | 21841 | 768 | 224 |
:convnext_tiny_fb_in22k_ft_in1k | 1000 | 768 | 224 |
:convnext_tiny_fb_in22k_ft_in1k_384 | 1000 | 768 | 384 |
:convnext_xlarge_fb_in22k | 21841 | 2048 | 224 |
:convnext_xlarge_fb_in22k_ft_in1k | 1000 | 2048 | 224 |
:convnext_xlarge_fb_in22k_ft_in1k_384 | 1000 | 2048 | 384 |
ConvNeXt V2
Luximm.Models.ConvNeXtV2Variant — Type
ConvNeXtV2VariantArchitectural config for a single ConvNeXtV2 variant.
Fields:
name: lookup key (e.g.:convnextv2_atto_fcmae).depths: per-stage block count,(d1, d2, d3, d4).dims: per-stage channel widths,(c1, c2, c3, c4).c1is also the stem output channels.c4isnum_features.hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with.0for the bare.fcmaeencoders,1000for the ImageNet-1K and ImageNet-22k-then-1K fine-tunes.default_input_size: native training resolution (224, 384, or 512) for the released checkpoint. Informational only: the model is fully convolutional and accepts any size, so this is not enforced.
Luximm.Models.CONVNEXTV2_VARIANTS — Constant
CONVNEXTV2_VARIANTS :: Dict{Symbol, ConvNeXtV2Variant}Lookup table for the ConvNeXtV2 variants this package ports. The .fcmae rows are the bare encoders; all other rows ship a 1000-class ImageNet head. convnextv2_small is not included because timm only registers it as .untrained (no pretrained weights).
Every ConvNeXtV2 checkpoint is released by Meta under Creative Commons Attribution-NonCommercial 4.0. Commercial use of these weights is not permitted. This applies to every row in the variant table below and is independent of Luximm.jl's own Apache 2.0 code license. If commercial use matters, BiT (Apache 2.0) or the ConvNeXt v1 .fb_* checkpoints (Apache 2.0) are the alternatives.
Registered variants
VGG
Luximm.Models.VGGVariant — Type
VGGVariantArchitectural config for a single VGG variant.
Fields:
name: lookup key (e.g.:vgg16_tv_in1k).cfg: flat layer list. Each entry is either anInt(a 3x3 pad-1 conv with that output-channel count, followed by ReLU) or theSymbol:M(a 2x2 stride-2 max-pool). Matches timm'scfgstable invgg.py.batch_norm: whether a BatchNorm sits between every conv and its ReLU (the*_bncheckpoints).hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with (1000 for every registered variant).default_input_size: native training resolution (224). Informational only: the model accepts any size large enough for the 7x7pre_logitsconv, but the released head was trained at 224.
Luximm.Models.VGG_VARIANTS — Constant
VGG_VARIANTS :: Dict{Symbol, VGGVariant}Lookup table for the VGG variants ported from timm: the four classic depths (11/13/16/19) in plain and BatchNorm flavors, all torchvision tv_in1k checkpoints. Keys are the timm model name with the dot rewritten as an underscore.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:vgg11_bn_tv_in1k | 1000 | 512 | 224 |
:vgg11_tv_in1k | 1000 | 512 | 224 |
:vgg13_bn_tv_in1k | 1000 | 512 | 224 |
:vgg13_tv_in1k | 1000 | 512 | 224 |
:vgg16_bn_tv_in1k | 1000 | 512 | 224 |
:vgg16_tv_in1k | 1000 | 512 | 224 |
:vgg19_bn_tv_in1k | 1000 | 512 | 224 |
:vgg19_tv_in1k | 1000 | 512 | 224 |
ViT
Luximm.Models.ViTVariant — Type
ViTVariantArchitectural config for a single Vision Transformer variant.
Fields:
name: lookup key (e.g.:vit_base_patch16_224_augreg2_in21k_ft_in1k).depth: number of transformer encoder blocks.embed_dim: token / channel width.num_heads: attention heads (head_dim = embed_dim ÷ num_heads).patch: patch side length (16).img_size: native input resolution the position embedding was trained at. Enforced by the constructor, since absolute pos-embed has no interpolation path yet.hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with.default_input_size: native training resolution (==img_size).
Luximm.Models.VIT_VARIANTS — Constant
VIT_VARIANTS :: Dict{Symbol, ViTVariant}Lookup table for the Vision Transformer variants ported from timm. Keys are the timm model name with dots rewritten as underscores.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:vit_base_patch16_224_augreg2_in21k_ft_in1k | 1000 | 768 | 224 |
CoAtNet
Luximm.Models.CoAtNetVariant — Type
CoAtNetVariantArchitectural config for a single CoAtNet variant.
Fields:
name: lookup key (e.g.:coatnet_0_rw_224_sw_in1k).depths: per-stage block count(d1, d2, d3, d4).dims: per-stage output channel widths(c1, c2, c3, c4);c4isnum_features.stem_width: the two stem conv widths(s1, s2);s2feeds stage 1.block_types: per-stage block kind,:C(MBConv) or:T(transformer).stride_mode: how MBConv blocks downsample —:pool(avg-pool the main path, stride-1 convs) or:dw(stride the depthwise conv, no pool).attn_early: MBConv SE placement —trueputs SE between the depthwise conv and norm2 (timmse_early),falseafter norm2 (timmse).se_act: MBConv SE bottleneck activation,:reluor:silu.transformer_shortcut_bias: whether the transformer downsample shortcut's 1x1 expand conv carries a bias.layer_scale: whether transformer blocks apply LayerScale (ls1/ls2per-channelgamma) to the attention and MLP residual branches.img_size: native input resolution (enforced; the transformer relative-position bias is sized to the per-stage feature map).hf_repo: HuggingFace repo containingmodel.safetensors.default_num_classes: head dimension the released weights ship with.default_input_size: native training resolution (==img_size).
Luximm.Models.COATNET_VARIANTS — Constant
COATNET_VARIANTS :: Dict{Symbol, CoAtNetVariant}Lookup table for the CoAtNet variants ported from timm. Keys are the timm model name with dots rewritten as underscores.
Registered variants
| Variant | num_classes | num_features | input size |
|---|---|---|---|
:coatnet_0_rw_224_sw_in1k | 1000 | 768 | 224 |
:coatnet_1_rw_224_sw_in1k | 1000 | 768 | 224 |
:coatnet_2_rw_224_sw_in12k | 11821 | 1024 | 224 |
:coatnet_2_rw_224_sw_in12k_ft_in1k | 1000 | 1024 | 224 |
:coatnet_3_rw_224_sw_in12k | 11821 | 1536 | 224 |