Registering a model
Before launching an inference instance, you register your model in EcoLink's Registry. This tells the platform where to find the weights. Two sources supported:
- HuggingFace — pull from a public or gated HF repo at launch time
- Shared filesystem — weights already living on one of your shared filesystems
From HuggingFace
Best for: standard off-the-shelf models and fine-tunes you've pushed to HF Hub.
Register
- Console → Registry → Models → Add model.
- Pick HuggingFace repo as the source.
- Fill in:
- Name — short friendly label (used as the model ID prefix in API calls)
- HF repo — e.g.,
meta-llama/Llama-3.1-70B-Instruct - HF revision — branch / commit / tag (optional, defaults to
main) - Category —
chat/embedding/reranker/image/audio/video - HF token — only required for gated models (Llama, Gemma, some others). Your token stays on the EcoLink side and isn't visible to other users.
- Save.
What happens at launch
When you launch an inference instance with an HF-backed model:
- The first pod in each region pulls the model from HF into a PersistentVolume (~20 GB for 7B models, ~140 GB for 70B).
- Subsequent replicas in the same region share the PVC — weights are downloaded once per region, not per pod.
- Future launches of the same model reuse the PVC.
First-launch download time scales with model size and HF's bandwidth — typically 2–10 minutes for ≤13B models, 20–60 minutes for 70B+.
From a shared filesystem
Best for: fine-tunes you've trained on your own GPU instances, weights you want loaded from a fast local filesystem instead of HF, or custom model formats.
Put the weights in place
- Launch a GPU instance with a shared filesystem attached at
/shared. - Put the model in a known location:
/shared/models/my-llama-finetune/withconfig.json,tokenizer.json,.safetensorsweights. - Terminate the instance (the filesystem persists).
Register
- Console → Registry → Models → Add model.
- Pick Shared filesystem as the source.
- Fill in:
- Name — short label
- Shared filesystem — pick from your dropdown
- Subfolder — the path inside the filesystem (e.g.,
models/my-llama-finetune). The full model path is<mount_path>/<subfolder>. - Category — same as above
- Region — locked to the filesystem's region; the instance must launch in the same region
- Save.
What happens at launch
The inference instance mounts the shared filesystem and loads weights directly from <mount_path>/<subfolder> — no download. Start-up is typically 30–90 seconds depending on model size and how quickly your container initializes.
Tradeoff: filesystem-backed models are pinned to a single region (the one their filesystem lives in). HF-backed models can run in any region.
Updating or re-registering
- Change HF revision to pull a newer version? Add a new registry entry with the new revision rather than editing in place — old inference instances keep using the old revision until restarted.
- Change FS subfolder? Same pattern: new registry entry, new inference instance.
- Delete a registered model from the Registry: it's safe as long as no inference instance is using it. If one is, the UI blocks deletion until you stop the instance.