Post

Vector Databases with Python: A Practical Guide

Learn how vector databases work and how to use them in Python with FAISS, Chroma, and Pinecone. Covers embeddings, similarity search, indexing strategies, and a complete semantic search example.

Vector Databases with Python: A Practical Guide

Why Vector Databases Exist

Traditional databases search by exact match or range — WHERE price < 100. They can’t answer “find me documents similar in meaning to this one.” That requires comparing high-dimensional vectors (embeddings), and doing it fast across millions of rows needs specialized indexing. That’s what a vector database is for.

If you’ve worked through RAG with Python, you’ve already used a vector store. This guide goes deeper: how similarity search actually works, and how to choose between FAISS, Chroma, and Pinecone.

1
2
3
4
5
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")
embedding = model.encode("Python is great for data science")
print(embedding.shape)  # (384,)
1
(384,)

That 384-number array is the “meaning” of the sentence in vector space. Similar sentences produce vectors that are close together — measured with cosine similarity or Euclidean distance.

Installation

1
pip install sentence-transformers faiss-cpu chromadb pinecone-client numpy

Option 1: FAISS — Local and Free

FAISS (Facebook AI Similarity Search) runs entirely in-process. No server, no API key — ideal for prototyping or small-to-medium datasets.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import faiss
import numpy as np

docs = [
    "Python is a popular programming language for AI",
    "Cats are independent pets that sleep a lot",
    "Machine learning models need large training datasets",
    "Dogs are loyal companions that need daily walks",
]

embeddings = model.encode(docs)
dimension = embeddings.shape[1]

index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings).astype("float32"))

query = model.encode(["What language is used for machine learning?"])
distances, indices = index.search(np.array(query).astype("float32"), k=2)

for i, dist in zip(indices[0], distances[0]):
    print(f"{docs[i]}  (distance={dist:.3f})")
1
2
Python is a popular programming language for AI  (distance=0.812)
Machine learning models need large training datasets  (distance=1.103)

IndexFlatL2 checks every vector — exact but slow at scale. For millions of vectors, swap in an approximate index:

1
2
3
4
5
6
7
nlist = 100  # number of clusters
quantizer = faiss.IndexFlatL2(dimension)
index_ivf = faiss.IndexIVFFlat(quantizer, dimension, nlist)

index_ivf.train(np.array(embeddings).astype("float32"))
index_ivf.add(np.array(embeddings).astype("float32"))
index_ivf.nprobe = 10  # how many clusters to search

IndexIVFFlat trades a small amount of accuracy for huge speed gains — this is the same approximate-nearest-neighbor tradeoff that powers production search at scale.

Option 2: Chroma — Local with Metadata Filtering

Chroma adds persistence and metadata filtering on top of a FAISS-like core, with a much simpler API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("articles")

collection.add(
    documents=docs,
    metadatas=[{"category": "tech"}, {"category": "pets"}, {"category": "tech"}, {"category": "pets"}],
    ids=["doc1", "doc2", "doc3", "doc4"],
)

results = collection.query(
    query_texts=["What language is used for machine learning?"],
    n_results=2,
    where={"category": "tech"},  # metadata filter
)

print(results["documents"])
1
[['Python is a popular programming language for AI', 'Machine learning models need large training datasets']]

The where filter is the killer feature: combine semantic search with structured filters like date, category, or user ID — something FAISS can’t do natively.

Option 3: Pinecone — Managed and Scalable

For production workloads with millions of vectors and no infrastructure to manage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="YOUR_API_KEY")

pc.create_index(
    name="articles",
    dimension=384,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)

index = pc.Index("articles")

vectors = [(f"doc{i}", emb.tolist(), {"text": doc}) for i, (emb, doc) in enumerate(zip(embeddings, docs))]
index.upsert(vectors=vectors)

query_emb = model.encode("What language is used for machine learning?").tolist()
matches = index.query(vector=query_emb, top_k=2, include_metadata=True)

for m in matches["matches"]:
    print(m["metadata"]["text"], m["score"])

Pinecone handles sharding, replication, and scaling automatically — you pay for that convenience, but it removes an entire category of ops work.

Choosing the Right One

NeedChoose
Prototyping, < 1M vectors, no serverFAISS
Local app with metadata filtersChroma
Production scale, managed infraPinecone
Already using Postgrespgvector extension

Building a Semantic Search Endpoint

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Query(BaseModel):
    text: str
    top_k: int = 5

@app.post("/search")
def search(query: Query):
    query_emb = model.encode([query.text]).astype("float32")
    distances, indices = index.search(query_emb, k=query.top_k)
    return {
        "results": [
            {"text": docs[i], "score": float(d)}
            for i, d in zip(indices[0], distances[0])
        ]
    }

This is the exact pattern used in production RAG systems: embed the query, search the index, return ranked results to feed into an LLM prompt.

Key Takeaways

  • Vector databases store embeddings and answer “what’s similar to this?” instead of exact-match queries
  • FAISS is best for free, local prototyping; Chroma adds metadata filters; Pinecone handles managed production scale
  • Approximate indexes (IndexIVFFlat) trade a little accuracy for major speed gains at scale
  • Cosine similarity and Euclidean distance are the two standard ways to compare embeddings
  • Combine vector search with metadata filters when you need both semantic relevance and structured constraints
  • This is the retrieval layer behind every RAG pipeline — get it right and the rest of the system gets easier
Khushal Jethava
Khushal Jethava

Machine Learning Engineer at Codiste, specializing in Generative AI, NLP, and Computer Vision. Building production AI systems with Python.

This post is licensed under CC BY 4.0 by the author.