Skip to main content

Registering a model

Before launching an inference instance, you register your model in EcoLink's Registry. This tells the platform where to find the weights. Two sources supported:

  1. HuggingFace — pull from a public or gated HF repo at launch time
  2. Shared filesystem — weights already living on one of your shared filesystems

From HuggingFace

Best for: standard off-the-shelf models and fine-tunes you've pushed to HF Hub.

Register

  1. Console → Registry → Models → Add model.
  2. Pick HuggingFace repo as the source.
  3. Fill in:
    • Name — short friendly label (used as the model ID prefix in API calls)
    • HF repo — e.g., meta-llama/Llama-3.1-70B-Instruct
    • HF revision — branch / commit / tag (optional, defaults to main)
    • Categorychat / embedding / reranker / image / audio / video
    • HF token — only required for gated models (Llama, Gemma, some others). Your token stays on the EcoLink side and isn't visible to other users.
  1. Save.

What happens at launch

When you launch an inference instance with an HF-backed model:

  1. The first pod in each region pulls the model from HF into a PersistentVolume (~20 GB for 7B models, ~140 GB for 70B).
  2. Subsequent replicas in the same region share the PVC — weights are downloaded once per region, not per pod.
  3. Future launches of the same model reuse the PVC.

First-launch download time scales with model size and HF's bandwidth — typically 2–10 minutes for ≤13B models, 20–60 minutes for 70B+.

From a shared filesystem

Best for: fine-tunes you've trained on your own GPU instances, weights you want loaded from a fast local filesystem instead of HF, or custom model formats.

Put the weights in place

  1. Launch a GPU instance with a shared filesystem attached at /shared.
  2. Put the model in a known location: /shared/models/my-llama-finetune/ with config.json, tokenizer.json, .safetensors weights.
  3. Terminate the instance (the filesystem persists).

Register

  1. Console → Registry → Models → Add model.
  2. Pick Shared filesystem as the source.
  3. Fill in:
    • Name — short label
    • Shared filesystem — pick from your dropdown
    • Subfolder — the path inside the filesystem (e.g., models/my-llama-finetune). The full model path is <mount_path>/<subfolder>.
    • Category — same as above
    • Region — locked to the filesystem's region; the instance must launch in the same region
  1. Save.

What happens at launch

The inference instance mounts the shared filesystem and loads weights directly from <mount_path>/<subfolder> — no download. Start-up is typically 30–90 seconds depending on model size and how quickly your container initializes.

Tradeoff: filesystem-backed models are pinned to a single region (the one their filesystem lives in). HF-backed models can run in any region.

Updating or re-registering

  • Change HF revision to pull a newer version? Add a new registry entry with the new revision rather than editing in place — old inference instances keep using the old revision until restarted.
  • Change FS subfolder? Same pattern: new registry entry, new inference instance.
  • Delete a registered model from the Registry: it's safe as long as no inference instance is using it. If one is, the UI blocks deletion until you stop the instance.

Next step

Launch an inference instance with your registered model