Embeddings
EcoLink's embedding endpoint is at POST /v1/embeddings — OpenAI-compatible.
Basic request
curl https://api.ecohash.com/v1/embeddings \
-H "Authorization: Bearer eco_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-bge-large-en-v1.5",
"input": "EcoLink makes GPU compute simple."
}'
Response:
{
"object": "list",
"data": [{
"object": "embedding",
"index": 0,
"embedding": [0.012, -0.004, 0.231, ...]
}],
"model": "text-embedding-bge-large-en-v1.5",
"usage": { "prompt_tokens": 9, "total_tokens": 9 }
}
Batch
Pass an array to embed multiple strings in one request:
from openai import OpenAI
client = OpenAI(api_key="eco_...", base_url="https://api.ecohash.com/v1")
resp = client.embeddings.create(
model="text-embedding-bge-large-en-v1.5",
input=[
"quantum entanglement",
"ham and eggs",
"the sun rises in the east",
],
)
vectors = [d.embedding for d in resp.data]
Batching is ~10× more throughput-efficient than calling one-at-a-time. Good batch size: 32–256 strings, depending on total token count.
Dimensions
text-embedding-bge-large-en-v1.5: 1024 floats per vector
If you need a specific dimension for your vector store (e.g., 768 for Pinecone starter), you can truncate — BGE embeddings are approximately normalized so L2-truncated vectors retain most semantic information, though retrieval quality drops slightly below 768.
Typical uses
- Semantic search — embed your documents, embed the query, cosine-similarity between them
- RAG — embed document chunks, retrieve top-k by similarity, feed them as context to an LLM
- Clustering — embed a corpus, then K-means / HDBSCAN on the vectors
- Deduplication — find near-duplicate docs by high similarity
Cost
Embeddings bill per input token. A 1K-token document typically costs a fraction of a cent to embed.
Tips
- Normalize before storing. The model returns approximately unit-length vectors but not exactly. Divide by
||v||if your vector store assumes unit length. - Batch your requests. One call with 100 strings is much cheaper (per token) than 100 calls with 1 string.
- Same model both sides. If you embed documents with model X, always embed queries with model X. Mixing models produces nonsense similarity.
Related
- Reranker — for improving search precision after an embedding retrieval.
- Chat completions — for the generation step in RAG.