Skip to main content

Common issues

Quick index of user-reported symptoms. Each entry lists the likely cause and a concrete fix.

API calls

401 Unauthorized

Symptom: API calls return {"error": "unauthorized"}.

Causes and fixes:

  • Missing Authorization: Bearer eco_... header → add it
  • Typo in the key → copy-paste from the API Keys page
  • Key was revoked → create a new one
  • Using a key from a different account → check the account switcher in the console

402 Payment Requiredinsufficient_balance

Cause: Your balance can't cover the request or the credit hold.

Fix: Stop resources you don't need to free held credit, or add balance.

402 Payment Requiredmodel_not_priced

Cause: Model ID is registered but has no pricing set yet.

Fix: Ping #ecolink-support with the model ID.

404 Not Found on a model

Causes:

  • Typo in the model ID
  • For user inference: missing :<instance_id> suffix on the model field
  • For user inference: your instance was terminated (check Compute → Model Instances)

Fix: verify the ID. For platform models, call GET /platform/models to list what's deployed.

429 Too Many Requests

Fix: back off with exponential delay, respect retry-after, batch requests where possible. See Rate limits.

503 Service Unavailable

Cause: No healthy pod for the requested model at this moment.

Fix: retry after a few seconds. If it persists for minutes on a single model, report in #ecolink-support.

My API call is very slow

Possible causes:

  • Cold-start on a platform model that scaled down while idle — first request after a quiet period takes ~15–30s extra
  • Long max_tokens — naturally slower completion
  • The router picked a further-away region (check x-ecolink-region)

Fix: For user inference that you control, keep at least one replica running to avoid cold-start. For platform models, warm-pod persistence is automatic.


GPU instances

Instance stuck in pending for > 15 minutes

Likely cause: image pull is slow, or the pod came up but the platform's watcher missed the Ready event.

What happens automatically: a 5-minute watchdog reconciles stuck-pending rows. If a pod is actually running in K8s, the status heals. If no pod exists, status flips to failed.

Fix: wait up to 20 min. If still stuck, terminate and relaunch. If consistently stuck with the same image, it may be huge (> 10 GB) — try a slimmer variant. Report the instance ID in #ecolink-support.

Instance says running but the terminal won't open

Likely cause: the pod was preempted or died after the DB updated but before the terminal could attach.

Fix: refresh the detail page. If status is now preempted, wait for recovery. If failed or terminated, launch a new instance.

Duration expired and the instance auto-terminated before I extended

Cause: estimated_duration_hours is a hard commitment. You can't recover the same instance.

Fix: launch a fresh one with the same config. If you had a cloud drive attached, reattach it — your files are still there.

insufficient credit balance on launch

Cause: the credit hold for duration × GPU count × rate exceeds your balance.

Fix: shorter duration, fewer GPUs, or add credit.

GPU count exceeds GPUs per node

Cause: more GPUs requested than a single node in the region has.

Fix: lower GPU count, or pick a different region.


Storage

Can't delete a cloud drive

Cause: it's attached to an active instance.

Fix: terminate the instance, then delete the drive.

Shared filesystem shows in_use but I think it's detached

Cause: in_use = "at least one instance has it mounted." Another of your instances may still hold it.

Fix: check Compute → GPU Instances for instances with the filesystem attached. Terminate any leftover ones.

My instance terminated and my notebooks / model checkpoints are gone

Cause: files in the container's writable filesystem are lost at termination.

Prevention: attach a cloud drive or shared filesystem at launch. Save anything you want to keep into its mount path (e.g. /workspace, /shared).


User inference

Instance stuck in pending for > 20 minutes

Likely cause: image pull OR model download (large HuggingFace repo).

Check: detail page shows per-region status pending → loading → running. Stuck at pending = image pull slow. Stuck at loading = model download or container init slow.

Fix: for 70B+ models, loading can legitimately take 15–30 min. If still stuck after 30 min, terminate and relaunch.

Requests time out / 504 Gateway Timeout

Cause: container isn't responding to /health or is hanging.

Fix: confirm the container works locally with docker run --gpus all. Bind to 0.0.0.0:<service_port>, not 127.0.0.1. Make sure /health is implemented.

402 on my own user inference instance

Cause: balance hit $0 on the previous 24h cycle; instance was terminated.

Fix: check Model Instances — it'll show stopped with reason credit_depleted. Restore balance and launch again.


Teams and invites

The invitee never got the email

Cause: email delivery for invites isn't always reliable right now.

Workaround: open the Users page → find the pending invite → copy the invite link → send it via Slack or another channel. The link works identically to an emailed one.

I accepted an invite but don't see the account in my switcher

Cause: need to be logged into EcoLink before clicking the link.

Fix: log in at console.ecohash.com, then paste the invite URL into your browser. The new account will appear in the top-left switcher.


Still stuck?

When reporting in #ecolink-support, include:

  • What you were trying to do
  • Exact error message + HTTP status
  • x-ecolink-request-id header (for API issues)
  • Resource ID (for console issues)
  • Approximate timestamp