Fine-tune and deploy
This page is the manual workflow — you run training yourself on a GPU instance with a shared filesystem, then deploy the result. If you'd rather have EcoLink handle the dataset → train → deploy loop end-to-end (LoRA only, no infra to manage), see the Fine-tuning section instead.
This walkthrough takes you from a base model on HuggingFace all the way to a managed inference endpoint serving your fine-tuned weights. The running example is Tongyi-MAI/Z-Image-Turbo, but the pattern is the same for any model.
High-level flow:
- Create a shared filesystem to hold the weights and fine-tune output.
- Launch a GPU instance with the filesystem attached — use it to download the base model and run fine-tuning.
- Save the trained weights to the filesystem and release the GPU instance.
- Register the trained weights as a model and deploy an inference instance against them.
1. Create a shared filesystem
A shared filesystem persists independent of any GPU instance, which is exactly what we want for model weights.
- Console → Storage → Shared File Systems → Create Filesystem.
- Name: e.g.
Z-Image-Turbo-Model-Path. - Region: pick one that has capacity of the GPU type you plan to use. The inference instance later will be pinned to this region.
- Mount path:
/mnt/shared(or whatever path you prefer — used by every instance that attaches this filesystem). - Click Create Filesystem.
2. Launch a GPU instance for fine-tuning
- Console → Compute → GPU Instances → Launch.
- Name: e.g.
Z-Image-Turbo-Instance. - Region: same region as the shared filesystem above.
- GPU Type / GPUs per Instance: pick what your fine-tune needs.
- Container Image:
Custom / Registry→ enterdocker.io/vllm/vllm-omni:v0.18.0(or your own image with the training tools you need). - Startup Command:
sleep infinity. The container needs to stay alive long enough for you to download weights and run training interactively — the default entrypoint would exit immediately since no model is on disk yet. - Storage → attach the shared filesystem you created in step 1 at its mount path.
- Set an Estimated Duration long enough for your download + fine-tune (you can extend later if needed).
- Click Launch.
Wait until the instance status is Running.
3. Download weights and fine-tune
-
Open the instance → Console tab (browser terminal).
-
Authenticate and pull the base model into the shared filesystem:
hf auth login
hf download Tongyi-MAI/Z-Image-Turbo \
--local-dir /mnt/shared/Tongyi-MAI/Z-Image-Turbo -
(Optional) Verify the download works before you start training:
vllm serve /mnt/shared/Tongyi-MAI/Z-Image-Turbo --omni --port 8000Stop it with
Ctrl-Conce it's serving cleanly. -
Run your fine-tuning workflow and save the output back into the shared filesystem, e.g.
/mnt/shared/Z-Image-Turbo-fine-tuning. Anything under/mnt/sharedpersists when the GPU instance is released. -
When fine-tuning finishes, terminate the GPU instance. The weights stay on the shared filesystem.
You don't have to do the training on EcoLink at all. If you already have fine-tuned weights elsewhere, just upload them into /mnt/shared/<your-folder>/ (see Uploading files) and skip straight to step 4.
4. Register the trained model
- Console → Registry → Models → Add model.
- Name: e.g.
Z-Image-Turbo-fine-tuning. - Category:
image(or whatever matches your model —chat,audio, etc.). - Framework:
Custom— we're loading weights from a filesystem, not from a HuggingFace repo. - Model Path:
/mnt/shared/Z-Image-Turbo-fine-tuning(the path your container will see). - Visibility:
Private. - Click Register Model.
The entry now appears in Registry → Models and is selectable when deploying an inference endpoint.
5. Deploy the fine-tuned model
- Console → Compute → Model Instances → Deploy Endpoint.
- Name: e.g.
Z-Image-Turbo-fine-tuning-instance. - Engine:
Custom Container. - Model source:
EcoLink Registry→ pick theZ-Image-Turbo-fine-tuningentry from step 4. The Model Path field is pre-filled from the registration (editable if you want to point at a subfolder). - Container Image:
docker.io/vllm/vllm-omni:v0.18.0(same image you used for training is a safe default). - Startup Command:
vllm serve /mnt/shared/Z-Image-Turbo-fine-tuning --omni --port 8000. - Service Port:
8000. - GPU Type / GPUs / Max replicas: set for your serving needs. Shared-filesystem-backed deploys run in 1 region (the filesystem's region).
- Click Deploy.
Once running, the instance detail page shows the endpoint URL (e.g. https://72.inference.ecohash.com). Call it like any OpenAI-compatible endpoint:
curl https://72.inference.ecohash.com/v1/images/generations \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A futuristic city with flying cars, cinematic lighting",
"n": 1,
"size": "512x512"
}'
Iteration loop
To try a new fine-tune run:
- Re-attach the same shared filesystem to a fresh GPU instance (or restart the same one if it's still within its duration).
- Train again, save to a new subfolder (e.g.
/mnt/shared/Z-Image-Turbo-fine-tuning-v2). - Register a new model entry pointing at the new subfolder.
- Deploy a new inference instance against it. The old instance can keep running until you're ready to cut over.
Prefer creating new Registry entries per revision rather than overwriting the same path — existing inference instances cache the path at launch and won't pick up new weights until redeployed.