Skip to main content

Fine-tune and deploy

Looking for managed fine-tuning?

This page is the manual workflow — you run training yourself on a GPU instance with a shared filesystem, then deploy the result. If you'd rather have EcoLink handle the dataset → train → deploy loop end-to-end (LoRA only, no infra to manage), see the Fine-tuning section instead.

This walkthrough takes you from a base model on HuggingFace all the way to a managed inference endpoint serving your fine-tuned weights. The running example is Tongyi-MAI/Z-Image-Turbo, but the pattern is the same for any model.

High-level flow:

  1. Create a shared filesystem to hold the weights and fine-tune output.
  2. Launch a GPU instance with the filesystem attached — use it to download the base model and run fine-tuning.
  3. Save the trained weights to the filesystem and release the GPU instance.
  4. Register the trained weights as a model and deploy an inference instance against them.

1. Create a shared filesystem

A shared filesystem persists independent of any GPU instance, which is exactly what we want for model weights.

  1. Console → Storage → Shared File Systems → Create Filesystem.
  2. Name: e.g. Z-Image-Turbo-Model-Path.
  3. Region: pick one that has capacity of the GPU type you plan to use. The inference instance later will be pinned to this region.
  4. Mount path: /mnt/shared (or whatever path you prefer — used by every instance that attaches this filesystem).
  5. Click Create Filesystem.

2. Launch a GPU instance for fine-tuning

  1. Console → Compute → GPU Instances → Launch.
  2. Name: e.g. Z-Image-Turbo-Instance.
  3. Region: same region as the shared filesystem above.
  4. GPU Type / GPUs per Instance: pick what your fine-tune needs.
  5. Container Image: Custom / Registry → enter docker.io/vllm/vllm-omni:v0.18.0 (or your own image with the training tools you need).
  6. Startup Command: sleep infinity. The container needs to stay alive long enough for you to download weights and run training interactively — the default entrypoint would exit immediately since no model is on disk yet.
  7. Storage → attach the shared filesystem you created in step 1 at its mount path.
  8. Set an Estimated Duration long enough for your download + fine-tune (you can extend later if needed).
  9. Click Launch.

Wait until the instance status is Running.

3. Download weights and fine-tune

  1. Open the instance → Console tab (browser terminal).

  2. Authenticate and pull the base model into the shared filesystem:

    hf auth login
    hf download Tongyi-MAI/Z-Image-Turbo \
    --local-dir /mnt/shared/Tongyi-MAI/Z-Image-Turbo
  3. (Optional) Verify the download works before you start training:

    vllm serve /mnt/shared/Tongyi-MAI/Z-Image-Turbo --omni --port 8000

    Stop it with Ctrl-C once it's serving cleanly.

  4. Run your fine-tuning workflow and save the output back into the shared filesystem, e.g. /mnt/shared/Z-Image-Turbo-fine-tuning. Anything under /mnt/shared persists when the GPU instance is released.

  5. When fine-tuning finishes, terminate the GPU instance. The weights stay on the shared filesystem.

tip

You don't have to do the training on EcoLink at all. If you already have fine-tuned weights elsewhere, just upload them into /mnt/shared/<your-folder>/ (see Uploading files) and skip straight to step 4.

4. Register the trained model

  1. Console → Registry → Models → Add model.
  2. Name: e.g. Z-Image-Turbo-fine-tuning.
  3. Category: image (or whatever matches your model — chat, audio, etc.).
  4. Framework: Custom — we're loading weights from a filesystem, not from a HuggingFace repo.
  5. Model Path: /mnt/shared/Z-Image-Turbo-fine-tuning (the path your container will see).
  6. Visibility: Private.
  7. Click Register Model.

The entry now appears in Registry → Models and is selectable when deploying an inference endpoint.

5. Deploy the fine-tuned model

  1. Console → Compute → Model Instances → Deploy Endpoint.
  2. Name: e.g. Z-Image-Turbo-fine-tuning-instance.
  3. Engine: Custom Container.
  4. Model source: EcoLink Registry → pick the Z-Image-Turbo-fine-tuning entry from step 4. The Model Path field is pre-filled from the registration (editable if you want to point at a subfolder).
  5. Container Image: docker.io/vllm/vllm-omni:v0.18.0 (same image you used for training is a safe default).
  6. Startup Command: vllm serve /mnt/shared/Z-Image-Turbo-fine-tuning --omni --port 8000.
  7. Service Port: 8000.
  8. GPU Type / GPUs / Max replicas: set for your serving needs. Shared-filesystem-backed deploys run in 1 region (the filesystem's region).
  9. Click Deploy.

Once running, the instance detail page shows the endpoint URL (e.g. https://72.inference.ecohash.com). Call it like any OpenAI-compatible endpoint:

curl https://72.inference.ecohash.com/v1/images/generations \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A futuristic city with flying cars, cinematic lighting",
"n": 1,
"size": "512x512"
}'

Iteration loop

To try a new fine-tune run:

  • Re-attach the same shared filesystem to a fresh GPU instance (or restart the same one if it's still within its duration).
  • Train again, save to a new subfolder (e.g. /mnt/shared/Z-Image-Turbo-fine-tuning-v2).
  • Register a new model entry pointing at the new subfolder.
  • Deploy a new inference instance against it. The old instance can keep running until you're ready to cut over.

Prefer creating new Registry entries per revision rather than overwriting the same path — existing inference instances cache the path at launch and won't pick up new weights until redeployed.

See also