Common issues

Quick index of user-reported symptoms. Each entry lists the likely cause and a concrete fix.

API calls

`401 Unauthorized`

Symptom: API calls return {"error": "unauthorized"}.

Causes and fixes:

Missing Authorization: Bearer eco_... header → add it
Typo in the key → copy-paste from the API Keys page
Key was revoked → create a new one
Using a key from a different account → check the account switcher in the console

`402 Payment Required` — `insufficient_balance`

Cause: Your balance can't cover the request or the credit hold.

Fix: Stop resources you don't need to free held credit, or add balance.

`402 Payment Required` — `model_not_priced`

Cause: Model ID is registered but has no pricing set yet.

Fix: Ping #ecolink-support with the model ID.

`404 Not Found` on a model

Causes:

Typo in the model ID
For user inference: missing :<instance_id> suffix on the model field
For user inference: your instance was terminated (check Compute → Model Instances)

Fix: verify the ID. For platform models, call GET /platform/models to list what's deployed.

`429 Too Many Requests`

Fix: back off with exponential delay, respect retry-after, batch requests where possible. See Rate limits.

`503 Service Unavailable`

Cause: No healthy pod for the requested model at this moment.

Fix: retry after a few seconds. If it persists for minutes on a single model, report in #ecolink-support.

My API call is very slow

Possible causes:

Cold-start on a platform model that scaled down while idle — first request after a quiet period takes ~15–30s extra
Long max_tokens — naturally slower completion
The router picked a further-away region (check x-ecolink-region)

Fix: For user inference that you control, keep at least one replica running to avoid cold-start. For platform models, warm-pod persistence is automatic.

GPU instances

Instance stuck in `pending` for > 15 minutes

Likely cause: image pull is slow, or the pod came up but the platform's watcher missed the Ready event.

What happens automatically: a 5-minute watchdog reconciles stuck-pending rows. If a pod is actually running in K8s, the status heals. If no pod exists, status flips to failed.

Fix: wait up to 20 min. If still stuck, terminate and relaunch. If consistently stuck with the same image, it may be huge (> 10 GB) — try a slimmer variant. Report the instance ID in #ecolink-support.

Instance says `running` but the terminal won't open

Likely cause: the pod was preempted or died after the DB updated but before the terminal could attach.

Fix: refresh the detail page. If status is now preempted, wait for recovery. If failed or terminated, launch a new instance.

Duration expired and the instance auto-terminated before I extended

Cause: estimated_duration_hours is a hard commitment. You can't recover the same instance.

Fix: launch a fresh one with the same config. If you had a cloud drive attached, reattach it — your files are still there.

`insufficient credit balance` on launch

Cause: the credit hold for duration × GPU count × rate exceeds your balance.

Fix: shorter duration, fewer GPUs, or add credit.

`GPU count exceeds GPUs per node`

Cause: more GPUs requested than a single node in the region has.

Fix: lower GPU count, or pick a different region.

Storage

Can't delete a cloud drive

Cause: it's attached to an active instance.

Fix: terminate the instance, then delete the drive.

Shared filesystem shows `in_use` but I think it's detached

Cause: in_use = "at least one instance has it mounted." Another of your instances may still hold it.

Fix: check Compute → GPU Instances for instances with the filesystem attached. Terminate any leftover ones.

My instance terminated and my notebooks / model checkpoints are gone

Cause: files in the container's writable filesystem are lost at termination.

Prevention: attach a cloud drive or shared filesystem at launch. Save anything you want to keep into its mount path (e.g. /workspace, /shared).

User inference

Instance stuck in `pending` for > 20 minutes

Likely cause: image pull OR model download (large HuggingFace repo).

Check: detail page shows per-region status pending → loading → running. Stuck at pending = image pull slow. Stuck at loading = model download or container init slow.

Fix: for 70B+ models, loading can legitimately take 15–30 min. If still stuck after 30 min, terminate and relaunch.

Requests time out / `504 Gateway Timeout`

Cause: container isn't responding to /health or is hanging.

Fix: confirm the container works locally with docker run --gpus all. Bind to 0.0.0.0:<service_port>, not 127.0.0.1. Make sure /health is implemented.

`402` on my own user inference instance

Cause: balance hit $0 on the previous 24h cycle; instance was terminated.

Fix: check Model Instances — it'll show stopped with reason credit_depleted. Restore balance and launch again.

Teams and invites

The invitee never got the email

Cause: email delivery for invites isn't always reliable right now.

Workaround: open the Users page → find the pending invite → copy the invite link → send it via Slack or another channel. The link works identically to an emailed one.

I accepted an invite but don't see the account in my switcher

Cause: need to be logged into EcoLink before clicking the link.

Fix: log in at console.ecohash.com, then paste the invite URL into your browser. The new account will appear in the top-left switcher.

Still stuck?

When reporting in #ecolink-support, include:

What you were trying to do
Exact error message + HTTP status
x-ecolink-request-id header (for API issues)
Resource ID (for console issues)
Approximate timestamp

API calls​

401 Unauthorized​

402 Payment Required — insufficient_balance​

402 Payment Required — model_not_priced​

404 Not Found on a model​

429 Too Many Requests​

503 Service Unavailable​

My API call is very slow​

GPU instances​

Instance stuck in pending for > 15 minutes​

Instance says running but the terminal won't open​

Duration expired and the instance auto-terminated before I extended​

insufficient credit balance on launch​

GPU count exceeds GPUs per node​

Storage​

Can't delete a cloud drive​

Shared filesystem shows in_use but I think it's detached​

My instance terminated and my notebooks / model checkpoints are gone​

User inference​

Instance stuck in pending for > 20 minutes​

Requests time out / 504 Gateway Timeout​

402 on my own user inference instance​

Teams and invites​

The invitee never got the email​

I accepted an invite but don't see the account in my switcher​

Still stuck?​