Skip to main content

Embeddings

EcoLink's embedding endpoint is at POST /v1/embeddings — OpenAI-compatible.

Basic request

curl https://api.ecohash.com/v1/embeddings \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-bge-large-en-v1.5",
"input": "EcoLink makes GPU compute simple."
}'

Response:

{
"object": "list",
"data": [{
"object": "embedding",
"index": 0,
"embedding": [0.012, -0.004, 0.231, ...]
}],
"model": "text-embedding-bge-large-en-v1.5",
"usage": { "prompt_tokens": 9, "total_tokens": 9 }
}

Batch

Pass an array to embed multiple strings in one request:

from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")

resp = client.embeddings.create(
model="text-embedding-bge-large-en-v1.5",
input=[
"quantum entanglement",
"ham and eggs",
"the sun rises in the east",
],
)
vectors = [d.embedding for d in resp.data]

Batching is ~10× more throughput-efficient than calling one-at-a-time. Good batch size: 32–256 strings, depending on total token count.

Dimensions

  • text-embedding-bge-large-en-v1.5: 1024 floats per vector

If you need a specific dimension for your vector store (e.g., 768 for Pinecone starter), you can truncate — BGE embeddings are approximately normalized so L2-truncated vectors retain most semantic information, though retrieval quality drops slightly below 768.

Typical uses

  • Semantic search — embed your documents, embed the query, cosine-similarity between them
  • RAG — embed document chunks, retrieve top-k by similarity, feed them as context to an LLM
  • Clustering — embed a corpus, then K-means / HDBSCAN on the vectors
  • Deduplication — find near-duplicate docs by high similarity

Cost

Embeddings bill per input token. A 1K-token document typically costs a fraction of a cent to embed.

Tips

  • Normalize before storing. The model returns approximately unit-length vectors but not exactly. Divide by ||v|| if your vector store assumes unit length.
  • Batch your requests. One call with 100 strings is much cheaper (per token) than 100 calls with 1 string.
  • Same model both sides. If you embed documents with model X, always embed queries with model X. Mixing models produces nonsense similarity.
  • Reranker — for improving search precision after an embedding retrieval.
  • Chat completions — for the generation step in RAG.