Skip to main content

Cost and lifecycle

User inference instances are production-ish serving endpoints — they're designed to keep running until your balance hits zero, unlike GPU instances which auto-stop at a fixed duration. Here's the full billing and lifecycle model.

The 24h hold model

At launch, EcoLink places a 24-hour credit hold:

hold_amount = hourly_rate × gpu_count × total_replicas × 24

total_replicas is determined by your model's source — 2 replicas across 2 regions (HuggingFace / platform-model-based) or 1 replica in the filesystem's region (shared-filesystem-backed). The launch dialog shows the per-GPU rate and the total hold for your selection.

  • The hold comes out of your available balance immediately
  • The hold is not "spent" — it's reserved. Actual spending is the real uptime × rate
  • Unused hold is refunded when the instance stops

If your balance is less than the initial hold, the launch fails with 402 Payment Required.

Every 24 hours: renew or partial-hold

Every 24h of continuous running, EcoLink's billing worker settles the previous cycle and creates the next:

  1. Consume the old hold: compute actual cost for the past 24h, charge that from the hold, refund any unused portion.
  2. Try to create a new 24h hold:
    • If balance ≥ full hold: full 24h hold created, cycle continues normally.
    • If balance < full hold but > $0: partial hold — whatever the balance covers. Example: balance $72, full hold would be $307.60 → partial 5.6h hold created with all $72 held. A warning notification is sent: "$X credit can cover Y more hours. Recharge to continue."
    • If balance = $0: instance is stopped, pods killed, termination_reason = 'credit_depleted', notification: "Instance terminated: credit balance depleted."

The cycle is autonomous — you don't have to press anything to keep it running. Just keep the balance topped up.

Ledger rows per cycle

Each 24h cycle generates visible ledger entries on the Billing page:

RowDescription
refund +$unusedRefund of unused portion of the old hold (only if any was unused)
gpu_usage -$overageExtra charge if the hold wasn't enough (rare)
gpu_usage -$next_holdThe new 24h prepay hold (full or partial)

If an instance ran for exactly 24h with 100% utilization, you'd see only the gpu_usage -$next_hold row — no refund, no overage.

See Balance and transactions for more on reading the ledger.

Warnings as balance runs low

As your balance approaches zero (with one or more running instances), EcoLink sends progressive notifications based on burn rate (sum of hourly rates for all your running resources):

  • 60 min remaining — warning
  • 30 min remaining — warning
  • 20 min remaining — warning
  • 10 min remaining — critical
  • 5 min remaining — critical
  • 3 min remaining — critical
  • 1 min remaining — critical

Notifications appear on the bell icon in the console. These give you time to adjust usage before things get terminated.

Terminating an instance manually

If you want to stop before running out of balance:

  1. Console → Model Instances → click instance → Terminate.
  2. Confirm.

The pods are killed across all regions, the hold is consumed up to actual uptime, unused hold is refunded.

Or via API:

curl https://api.ecohash.com/inference-instances/142 \
-X DELETE \
-H "Authorization: Bearer eco_YOUR_KEY"

Deleting a registered model

You can delete a registered model from the Registry, but only if no running inference instance references it. Delete the instance first, then the model.

Edge cases

What if my account runs out of balance mid-cycle? The instance runs until the partial hold is consumed, then stops. You see the timeline in Messages and in the Billing transaction history.

What if I pause my account (no activity for a week)? If the instance is running, it keeps running and billing. To pause cost, terminate the instance — the registered model entry stays, and you can re-launch with the same config when ready.

What if one region fails but others are healthy? Traffic automatically fails over to healthy regions (see Calling your endpoint). Billing continues per running pod in each region.