Learning Notes

FAISS

f## 1. What FAISS is FAISS (Facebook AI Similarity Search) is a library for fast similarity search over vector embeddings. It retrieves nearest vectors to a query vector using different indexing strategies.

2. Core Idea

Workflow: Text → Embedding → Vector Index → Similarity Search → Top-K Results → Map to original text (RAG retrieval)

3. Indexes (Main Search Engine Types)

A. Flat Indexes (Exact Search)

IndexFlatIP → Inner Product (cosine similarity when normalized)
IndexFlatL2 → Euclidean distance
Behavior: brute-force scan of all vectors
Pros: exact results
Cons: slow at scale (O(N))

B. IVF Indexes (Cluster-Based Approximate Search)

IndexIVFFlat → clusters + exact search inside selected clusters
IndexIVFPQ → IVF + compression (Product Quantization)
IndexIVFSQ → scalar quantization variant

Mechanism:

Cluster vectors (k-means)
Assign vectors to centroids
Query searches only nearest clusters (controlled by nprobe)

Tradeoff: speed vs accuracy

C. HNSW Index (Graph-Based Search)

IndexHNSWFlat Mechanism:
Builds graph of vector neighbors
Search by traversing closest nodes

Pros:

Very fast
High recall Cons:
Memory heavy

D. Compression Indexes

PQ (Product Quantization)
IVFPQ (IVF + PQ compression)

Purpose:

Reduce memory footprint
Store compressed vector codes instead of full floats

4. ID Wrappers (Output Mapping Layer)

IndexIDMap

Maps FAISS internal positions → real-world IDs (chunk_id, doc_id) Without:

returns 0,1,2,...

With:

returns meaningful IDs for RAG retrieval

IndexIDMap2

Advanced version with better flexibility for dynamic updates

Key point:

Does NOT affect search, only output identity

5. Training Component

Required for:

IVF (clustering)
PQ (codebook learning)

Function:

index.train(data)

What it does:

Learns clusters (IVF)
Learns quantization codebooks (PQ)

6. Vector Transformations (Preprocessing)

Normalization → unit length vectors (enables cosine via inner product)
PCA → dimensionality reduction
OPQ → optimized rotation for better compression

7. Similarity Metrics

Inner Product (IP): used for cosine similarity when vectors are normalized
L2 Distance: Euclidean distance

8. Search Optimization Parameters

IVF: nprobe → number of clusters searched
HNSW: efSearch → graph traversal depth

Higher = more accurate, slower

9. GPU Acceleration

GPU IndexFlatIP / GPU IVF
index_cpu_to_gpu()

Used for:

large-scale / high-throughput inference

10. Index Factory

faiss.index_factory()

Examples:

"Flat"
"IVF100,Flat"
"IVF100,PQ16"
"HNSW32"

Purpose:

declarative index creation

11. Persistence Layer

write_index()
read_index()

Used to save/load FAISS indexes in production

12. Metadata Layer (External System)

FAISS does NOT store text.

You must maintain:

chunk text
document IDs
metadata

IndexIDMap connects vectors → metadata system

13. Full RAG Flow

Chunk text
Embed (BGE-m3 or similar)
Normalize vectors
Build FAISS index
- Flat / IVF / HNSW
- optionally IDMap
Query embedding
FAISS search
Retrieve chunk IDs
Fetch actual text from DB

14. Big Picture Mental Model

Flat → brute force exact search
IVF → clustered approximate search
HNSW → graph traversal search
PQ → compressed storage
IDMap → identity mapping layer
Training → structure learning step
GPU → acceleration layer

15. Key Engineering Tradeoffs

Flat: accuracy ↑, speed ↓
IVF: balanced
HNSW: fast + accurate, memory heavy
PQ: memory efficient, less accurate