Fine-tune and deploy

Looking for managed fine-tuning?

This page is the manual workflow — you run training yourself on a GPU instance with a shared filesystem, then deploy the result. If you'd rather have EcoLink handle the dataset → train → deploy loop end-to-end (LoRA only, no infra to manage), see the Fine-tuning section instead.

This walkthrough takes you from a base model on HuggingFace all the way to a managed inference endpoint serving your fine-tuned weights. The running example is Tongyi-MAI/Z-Image-Turbo, but the pattern is the same for any model.

High-level flow:

Create a shared filesystem to hold the weights and fine-tune output.
Launch a GPU instance with the filesystem attached — use it to download the base model and run fine-tuning.
Save the trained weights to the filesystem and release the GPU instance.
Register the trained weights as a model and deploy an inference instance against them.

1. Create a shared filesystem

A shared filesystem persists independent of any GPU instance, which is exactly what we want for model weights.

Console → Storage → Shared File Systems → Create Filesystem.
Name: e.g. Z-Image-Turbo-Model-Path.
Region: pick one that has capacity of the GPU type you plan to use. The inference instance later will be pinned to this region.
Mount path: /mnt/shared (or whatever path you prefer — used by every instance that attaches this filesystem).
Click Create Filesystem.

2. Launch a GPU instance for fine-tuning

Console → Compute → GPU Instances → Launch.
Name: e.g. Z-Image-Turbo-Instance.
Region: same region as the shared filesystem above.
GPU Type / GPUs per Instance: pick what your fine-tune needs.
Container Image: Custom / Registry → enter docker.io/vllm/vllm-omni:v0.18.0 (or your own image with the training tools you need).
Startup Command: sleep infinity. The container needs to stay alive long enough for you to download weights and run training interactively — the default entrypoint would exit immediately since no model is on disk yet.
Storage → attach the shared filesystem you created in step 1 at its mount path.
Set an Estimated Duration long enough for your download + fine-tune (you can extend later if needed).
Click Launch.

Wait until the instance status is Running.

3. Download weights and fine-tune

Open the instance → Console tab (browser terminal).

Authenticate and pull the base model into the shared filesystem:

hf auth login
hf download Tongyi-MAI/Z-Image-Turbo \
  --local-dir /mnt/shared/Tongyi-MAI/Z-Image-Turbo

(Optional) Verify the download works before you start training:
```
vllm serve /mnt/shared/Tongyi-MAI/Z-Image-Turbo --omni --port 8000
```
Stop it with Ctrl-C once it's serving cleanly.
Run your fine-tuning workflow and save the output back into the shared filesystem, e.g. /mnt/shared/Z-Image-Turbo-fine-tuning. Anything under /mnt/shared persists when the GPU instance is released.
When fine-tuning finishes, terminate the GPU instance. The weights stay on the shared filesystem.

tip

You don't have to do the training on EcoLink at all. If you already have fine-tuned weights elsewhere, just upload them into /mnt/shared/<your-folder>/ (see Uploading files) and skip straight to step 4.

4. Register the trained model

Console → Registry → Models → Add model.
Name: e.g. Z-Image-Turbo-fine-tuning.
Category: image (or whatever matches your model — chat, audio, etc.).
Framework: Custom — we're loading weights from a filesystem, not from a HuggingFace repo.
Model Path: /mnt/shared/Z-Image-Turbo-fine-tuning (the path your container will see).
Visibility: Private.
Click Register Model.

The entry now appears in Registry → Models and is selectable when deploying an inference endpoint.

5. Deploy the fine-tuned model

Console → Compute → Model Instances → Deploy Endpoint.
Name: e.g. Z-Image-Turbo-fine-tuning-instance.
Engine: Custom Container.
Model source: EcoLink Registry → pick the Z-Image-Turbo-fine-tuning entry from step 4. The Model Path field is pre-filled from the registration (editable if you want to point at a subfolder).
Container Image: docker.io/vllm/vllm-omni:v0.18.0 (same image you used for training is a safe default).
Startup Command: vllm serve /mnt/shared/Z-Image-Turbo-fine-tuning --omni --port 8000.
Service Port: 8000.
GPU Type / GPUs / Max replicas: set for your serving needs. Shared-filesystem-backed deploys run in 1 region (the filesystem's region).
Click Deploy.

Once running, the instance detail page shows the endpoint URL (e.g. https://72.inference.ecohash.com). Call it like any OpenAI-compatible endpoint:

curl https://72.inference.ecohash.com/v1/images/generations \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A futuristic city with flying cars, cinematic lighting",
    "n": 1,
    "size": "512x512"
  }'

Iteration loop

To try a new fine-tune run:

Re-attach the same shared filesystem to a fresh GPU instance (or restart the same one if it's still within its duration).
Train again, save to a new subfolder (e.g. /mnt/shared/Z-Image-Turbo-fine-tuning-v2).
Register a new model entry pointing at the new subfolder.
Deploy a new inference instance against it. The old instance can keep running until you're ready to cut over.

Prefer creating new Registry entries per revision rather than overwriting the same path — existing inference instances cache the path at launch and won't pick up new weights until redeployed.

1. Create a shared filesystem​

2. Launch a GPU instance for fine-tuning​

3. Download weights and fine-tune​

4. Register the trained model​

5. Deploy the fine-tuned model​

Iteration loop​

See also​