Skip to content

Core Concepts

Embex simplifies vector database interactions into three main concepts: Collections, Vectors, and Search.

A Collection is a container for your vectors. It’s similar to a “table” in a SQL database.

  • Name: A unique identifier (e.g., “users”, “products”).
  • Dimension: The size of the vectors. This must match your embedding model (e.g., 384 for all-MiniLM-L6-v2, 1536 for OpenAI).
# Create a collection for MiniLM embeddings (384 dimensions)
await client.create_collection("products", dimension=384)
# List all collections
collections = await client.list_collections()
# Delete a collection
await client.delete_collection("products")

A Vector is the core data unit. It represents an object (text, image, audio) as a list of numbers generated by an embedding model.

FieldTypeDescription
idStringUnique identifier for the record.
vectorList[Float]The embedding array (from your model).
metadataMapOptional JSON key-value pairs (e.g., source text, tags).
from embex import Vector
# Vector usually comes from a model:
# vector = model.encode("Super Widget").tolist()
vec = Vector(
id="prod_123",
vector=[-0.12, 0.05, 0.88, ...], # 384 floats
metadata={
"name": "Super Widget",
"price": 99.99,
"category": "electronics"
}
)
await client.insert("products", [vec])

Search finds the vectors most similar to a query vector. You generate a vector for your query text and Embex finds the nearest neighbors.

# query_vector = model.encode("smartphone").tolist()
results = await client.search(
collection_name="products",
vector=query_vector,
limit=5
)

You can refine search results using metadata filters. Embex uses a structured filter syntax:

results = await client.search(
collection_name="products",
vector=query_vector,
limit=5,
filter={
"must": [
{"key": "category", "match": {"value": "electronics"}}
],
"must_not": [
{"key": "price", "range": {"gt": 1000.0}}
]
}
)