Rate limits
EcoLink enforces per-API-key rate limits to protect the platform from runaway loops and to share capacity fairly across tenants.
How to tell
When you hit a limit, the response is:
HTTP 429 Too Many Requests
{ "error": "rate_limit_exceeded", "message": "Too many requests — please slow down" }
Along with response headers describing the current state:
| Header | Meaning |
|---|---|
x-ratelimit-limit | Requests allowed in the current window |
x-ratelimit-remaining | Requests left in the current window |
x-ratelimit-reset | Seconds until the window resets |
retry-after | Seconds to wait before retrying |
Strategy: exponential backoff
Client-side best practice:
import time, requests
from random import random
def call_with_backoff(url, headers, json_body, max_retries=5):
for attempt in range(max_retries):
resp = requests.post(url, headers=headers, json=json_body)
if resp.status_code != 429:
return resp
# 429 — back off. Use Retry-After if present.
sleep = int(resp.headers.get("retry-after", 2 ** attempt)) + random()
time.sleep(sleep)
return resp # final attempt's response
Limits for pay-as-you-go accounts
Pay-as-you-go accounts are gated by a single per-API-key cap that prevents runaway loops or leaked keys from saturating the platform:
| Dimension | Limit |
|---|---|
| Requests per minute, per API key | 200 |
| Tokens per minute | not enforced as a separate cap; per-request token limits are governed by the model's context window |
| Concurrent in-flight requests | not explicitly capped |
This is generous for interactive development and most small-to-mid production workloads. If you're saturating 200 RPM on a single API key, the simplest fixes are:
- Use multiple API keys for separately-tracked services (e.g., production app vs analytics batch job). Each gets its own 200 RPM budget.
- Batch your requests where possible (embeddings batches of 32–256 strings, multi-message chat).
- Reach out in the
#ecolink-supportSlack channel — for sustained high-throughput use cases, ops can raise the per-key cap on your account.
Enterprise rate limits
Enterprise contracts include negotiated rate-limit caps tied to your committed usage. Reach out to sales if you need predictable high-throughput limits as part of a production SLA.
Avoiding 429s
- Batch your requests where possible. Embeddings in batches of 32–256 strings is much better than one-at-a-time.
- Stream long completions so you don't hold open many sockets unnecessarily.
- Respect
retry-after— don't spam-retry through the limit window; it'll only extend the throttle. - Use dedicated keys per service so one runaway script doesn't affect production traffic on the same key.
Other status codes that look like rate limits
Don't confuse with:
503 Service Unavailable— the target model has no healthy pod right now. Retry after a few seconds; not a tenant-specific rate limit.402 Payment Required— your balance is insufficient. Not a rate limit; no amount of waiting fixes this without crediting the account.