Embeddings

EcoLink's embedding endpoint is at POST /v1/embeddings — OpenAI-compatible.

Basic request

curl https://api.ecohash.com/v1/embeddings \
  -H "Authorization: Bearer eco_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-bge-large-en-v1.5",
    "input": "EcoLink makes GPU compute simple."
  }'

Response:

{
  "object": "list",
  "data": [{
    "object": "embedding",
    "index": 0,
    "embedding": [0.012, -0.004, 0.231, ...]
  }],
  "model": "text-embedding-bge-large-en-v1.5",
  "usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

Batch

Pass an array to embed multiple strings in one request:

from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")

resp = client.embeddings.create(
    model="text-embedding-bge-large-en-v1.5",
    input=[
        "quantum entanglement",
        "ham and eggs",
        "the sun rises in the east",
    ],
)
vectors = [d.embedding for d in resp.data]

Batching is ~10× more throughput-efficient than calling one-at-a-time. Good batch size: 32–256 strings, depending on total token count.

Dimensions

text-embedding-bge-large-en-v1.5: 1024 floats per vector

If you need a specific dimension for your vector store (e.g., 768 for Pinecone starter), you can truncate — BGE embeddings are approximately normalized so L2-truncated vectors retain most semantic information, though retrieval quality drops slightly below 768.

Typical uses

Semantic search — embed your documents, embed the query, cosine-similarity between them
RAG — embed document chunks, retrieve top-k by similarity, feed them as context to an LLM
Clustering — embed a corpus, then K-means / HDBSCAN on the vectors
Deduplication — find near-duplicate docs by high similarity

Cost

Embeddings bill per input token. A 1K-token document typically costs a fraction of a cent to embed.

Tips

Normalize before storing. The model returns approximately unit-length vectors but not exactly. Divide by ||v|| if your vector store assumes unit length.
Batch your requests. One call with 100 strings is much cheaper (per token) than 100 calls with 1 string.
Same model both sides. If you embed documents with model X, always embed queries with model X. Mixing models produces nonsense similarity.

Reranker — for improving search precision after an embedding retrieval.
Chat completions — for the generation step in RAG.

Basic request​

Batch​

Dimensions​

Typical uses​

Cost​

Tips​

Related​