Common issues
Quick index of user-reported symptoms. Each entry lists the likely cause and a concrete fix.
API calls
401 Unauthorized
Symptom: API calls return {"error": "unauthorized"}.
Causes and fixes:
- Missing
Authorization: Bearer eco_...header → add it - Typo in the key → copy-paste from the API Keys page
- Key was revoked → create a new one
- Using a key from a different account → check the account switcher in the console
402 Payment Required — insufficient_balance
Cause: Your balance can't cover the request or the credit hold.
Fix: Stop resources you don't need to free held credit, or add balance.
402 Payment Required — model_not_priced
Cause: Model ID is registered but has no pricing set yet.
Fix: Ping #ecolink-support with the model ID.
404 Not Found on a model
Causes:
- Typo in the model ID
- For user inference: missing
:<instance_id>suffix on themodelfield - For user inference: your instance was terminated (check Compute → Model Instances)
Fix: verify the ID. For platform models, call GET /platform/models to list what's deployed.
429 Too Many Requests
Fix: back off with exponential delay, respect retry-after, batch requests where possible. See Rate limits.
503 Service Unavailable
Cause: No healthy pod for the requested model at this moment.
Fix: retry after a few seconds. If it persists for minutes on a single model, report in #ecolink-support.
My API call is very slow
Possible causes:
- Cold-start on a platform model that scaled down while idle — first request after a quiet period takes ~15–30s extra
- Long
max_tokens— naturally slower completion - The router picked a further-away region (check
x-ecolink-region)
Fix: For user inference that you control, keep at least one replica running to avoid cold-start. For platform models, warm-pod persistence is automatic.
GPU instances
Instance stuck in pending for > 15 minutes
Likely cause: image pull is slow, or the pod came up but the platform's watcher missed the Ready event.
What happens automatically: a 5-minute watchdog reconciles stuck-pending rows. If a pod is actually running in K8s, the status heals. If no pod exists, status flips to failed.
Fix: wait up to 20 min. If still stuck, terminate and relaunch. If consistently stuck with the same image, it may be huge (> 10 GB) — try a slimmer variant. Report the instance ID in #ecolink-support.
Instance says running but the terminal won't open
Likely cause: the pod was preempted or died after the DB updated but before the terminal could attach.
Fix: refresh the detail page. If status is now preempted, wait for recovery. If failed or terminated, launch a new instance.
Duration expired and the instance auto-terminated before I extended
Cause: estimated_duration_hours is a hard commitment. You can't recover the same instance.
Fix: launch a fresh one with the same config. If you had a cloud drive attached, reattach it — your files are still there.
insufficient credit balance on launch
Cause: the credit hold for duration × GPU count × rate exceeds your balance.
Fix: shorter duration, fewer GPUs, or add credit.
GPU count exceeds GPUs per node
Cause: more GPUs requested than a single node in the region has.
Fix: lower GPU count, or pick a different region.
Storage
Can't delete a cloud drive
Cause: it's attached to an active instance.
Fix: terminate the instance, then delete the drive.
Shared filesystem shows in_use but I think it's detached
Cause: in_use = "at least one instance has it mounted." Another of your instances may still hold it.
Fix: check Compute → GPU Instances for instances with the filesystem attached. Terminate any leftover ones.
My instance terminated and my notebooks / model checkpoints are gone
Cause: files in the container's writable filesystem are lost at termination.
Prevention: attach a cloud drive or shared filesystem at launch. Save anything you want to keep into its mount path (e.g. /workspace, /shared).
User inference
Instance stuck in pending for > 20 minutes
Likely cause: image pull OR model download (large HuggingFace repo).
Check: detail page shows per-region status pending → loading → running. Stuck at pending = image pull slow. Stuck at loading = model download or container init slow.
Fix: for 70B+ models, loading can legitimately take 15–30 min. If still stuck after 30 min, terminate and relaunch.
Requests time out / 504 Gateway Timeout
Cause: container isn't responding to /health or is hanging.
Fix: confirm the container works locally with docker run --gpus all. Bind to 0.0.0.0:<service_port>, not 127.0.0.1. Make sure /health is implemented.
402 on my own user inference instance
Cause: balance hit $0 on the previous 24h cycle; instance was terminated.
Fix: check Model Instances — it'll show stopped with reason credit_depleted. Restore balance and launch again.
Teams and invites
The invitee never got the email
Cause: email delivery for invites isn't always reliable right now.
Workaround: open the Users page → find the pending invite → copy the invite link → send it via Slack or another channel. The link works identically to an emailed one.
I accepted an invite but don't see the account in my switcher
Cause: need to be logged into EcoLink before clicking the link.
Fix: log in at console.ecohash.com, then paste the invite URL into your browser. The new account will appear in the top-left switcher.
Still stuck?
When reporting in #ecolink-support, include:
- What you were trying to do
- Exact error message + HTTP status
x-ecolink-request-idheader (for API issues)- Resource ID (for console issues)
- Approximate timestamp