Skip to main content
Ctrl+K
AI Study Roadmap - Home AI Study Roadmap - Home Organization logo Organization logo

AI Study Roadmap

Ctrl+K
  • GitLab
AI Study Roadmap - Home AI Study Roadmap - Home

AI Study Roadmap

Ctrl+K
  • GitLab
Table of Contents
dashboard Overview
smart_toy What Is AI Engineering? expand_more
AI Training Overview
Foundations expand_more
Introduction to AI & Generative AI
Introduction to RAG and Theoretical Foundations
LangChain Framework and Core Components
Modern RAG Architecture
Practice — AI Foundations chevron_right
Basic AI Fundamentals Slides Basic AI Fundamentals - Final Assignment
Assessment — Foundations chevron_right
Quiz and Summary
RAG Optimization expand_more
Advanced Indexing
Advanced Retrieval Strategies
Query Transformations
Re-ranking
GraphRAG Implementation
Multimodal RAG NEW
Practice — RAG Optimization chevron_right
Assignment: Hybrid Search Assignment: Post-Retrieval Processing Assignment: Query Transformation Assignment: Advanced Indexing Assignment: GraphRAG Implementation Assignment: Tool Calling & Tavily Search Integration
Assessment — RAG Optimization chevron_right
Quiz Quiz Quiz Quiz Quiz Quiz for LangGraph and Agentic AI module
Agents expand_more
LangGraph Foundations & State Management
Tool Calling & Tavily Search
Multi-Expert Research Agent with ReAct Pattern
Multi-Agent Collaboration
Human-in-the-Loop & Persistence
Model Context Protocol (MCP) NEW
Agent Memory Systems NEW
Context Engineering
Harness Engineering
Practice — Agents chevron_right
Assignment: LangGraph Foundations & State Management Assignment: Multi-Expert ReAct Research Agent Assignment: Human-in-the-Loop & Persistence Assignment: FPT Customer Chatbot - Multi-Agent System Assignment: Harness Engineering Assignment: Context Engineering
Assessment — Agents chevron_right
Quiz for LangGraph and Agentic AI module Quiz for LangGraph and Agentic AI module Quiz for LangGraph and Agentic AI module Quiz for LangGraph and Agentic AI module Quiz Quiz Quiz & Appendix - Advanced
LLMOps expand_more
Evaluation Toolkit - Ragas
Observability: LangFuse & LangSmith
Experiment Comparison: Naive, Graph, Hybrid
AI Safety & Guardrails NEW
Building RAG Agent using LangChain
Practice — LLMOps chevron_right
Assignment: RAGAS Evaluation Metrics Assignment: RAG Architecture Experiment Comparison Assignment: LLM Observability with LangFuse & LangSmith
Assessment — LLMOps chevron_right
Quiz
AI Cheatsheets expand_more
Prompt Engineering Quick Reference
LLM API Cheatsheet
Embeddings & Vector Search
Exams expand_more
AI Theory Exams
AI Project Exams
Basic AI Fundamentals Quiz
Exam Theory: RAG and Optimization
Final Exam: Enterprise RAG System
Final Exam
Final Project Exam: FPT Customer Chatbot - Multi-Agent AI System
LLMOps and Evaluation Question Bank
Final Exam: Production-Ready RAG Evaluation System
Introduction
code What Is Software Engineering? expand_more
Software Engineering Training Program
Foundations expand_more
Web Concepts
Threads vs Processes
What asyncio is ?
greenlet
File Descriptors
Event Loop
CPython vs Pure Python
Practice — Foundations chevron_right
Write your first API apps 📘 Assignment: Day 1 – Foundation and First Steps with FastAPI Concurrency Code Exploration fork_sample_call_fn.py socket_server_sample.py ✔️ Preserves Liskov Substitution Principle
Assessment chevron_right
Foundations Assessment Quiz
API Development expand_more
FastAPI Intro
What ASGI Is
What are Path Parameters in FastAPI?
FastAPI Code Examples for Query Parameters
Header Parameters in FastAPI
Body in FastAPI
Data Modeling (Pydantic)
Server-Sent Events (SSE)
Dependency Injection
Practice — API Development chevron_right
Try to run FastAPI in Jupyter Notebook Async & Event Loop Exploration Assignment: Building a Product Inventory API with FastAPI and Pydantic main-memory-fastAPI.py Function Type vs Method Type Problem set the PYTHONPATH when debugging with F5 in Visual Studio Code (VS Code) Diagram 1. selectors.py and its design
Assessment — API Development chevron_right
Applied Assessment Quiz
Data Persistence expand_more
Install required packages
Async Python, Postgres, and SQLAlchemy
Alembic Introduction
CRUD Application Overview
Practice — Data Persistence chevron_right
Step-by-step: Add Connection in pgAdmin ~~Install Docker CE (On Mac)~~ Database Setup Reference Becareful 1. Core Concept Exercise: Setup PostgreSQL on Docker CE (not Desktop) Exercise: FastAPI + Async SQLAlchemy + Basic TODO CRUD Exercise: End-to-End Solution Exercise: Migrations (Alembic), Concurrency & Transactions Exercise: Pagination, Filtering, Validation & Error Handling env.py
Assessment — Data Persistence chevron_right
Integration Assessment Quiz
Security & Testing expand_more
What is JWT ?
What is OAuth 2.0
Google OAuth2 Authentication
Authentication Implementation Overview
Unit Testing FastAPI Applications
Practice — Advanced chevron_right
FastAPI Advanced Topics: Authentication, Authorization, and Testing FastAPI Advanced Topics: Authentication, Authorization, and Testing Assignment: Working With JWT in Python OAuth2 Python Assignment Solutions for JWT Assignment FULL SOLUTION — OAuth2 in Pure Python (GitHub Authorization Code Flow) Assignment 1 — Build a Full Test Suite for a FastAPI App JWT Implementation Reference init.py Pytest Reference Implementation
Assessment chevron_right
Advanced Assessment Quiz
Software Engineering Cheatsheets expand_more
Git Collaboration Workflow
Relational Databases
API Mastery: REST & Security
Testing Methodologies & TDD
CI/CD Pipelines
Docker Fundamentals
Secure Coding Practices
Code Review Practices
Container Orchestration with Docker Compose
Clean Architecture & Layering
Microservices vs Serverless
Exams expand_more
Final Project Exam: FPT Customer Chatbot - Backend API System
Introduction to Software Engineering
Software Engineering Appendix expand_more
Frontend Practice — Chatbot Backend API
cloud What Is Cloud & Infrastructure? expand_more
Cloud & Infrastructure Training Program
Foundations expand_more
Introduction to Cloud
Docker Fundamentals & Best Practices
Monolith vs. Microservices: Principles, Pros & Cons
Practice chevron_right
Assignments Containerization with Docker Assignment
Assessment chevron_right
Containerization with Docker - Quiz
Applied expand_more
Basic AWS Services Essential
CI/CD Automation Pipelines
Continuous Code Quality with SonarQube
Practice chevron_right
CI/CD and Deployment Assignment Continuous Code Quality with SonarQube Assignment
Assessment chevron_right
CI/CD and Deployment - Quiz Continuous Code Quality with SonarQube - Quiz
Integration expand_more
Implementing API Gateway
Message Queues with RabbitMQ
Practice
Assessment
Advanced expand_more
SAGA Pattern Concepts
Performance — Redis Caching
Observability
Review & E2E Debugging
Practice chevron_right
Assignments
Assessment
Cloud & Infrastructure Cheatsheets expand_more
AWS Core Services Quick Reference
Kubernetes Quick Reference
CI/CD Patterns Quick Reference
Managed ML Services Cheatsheet
Exams expand_more
Basic DevOps Essentials for Developer - Theory Exam
Project Exam
Quiz
Final Exam: Deploy FastAPI Application to AWS Cloud
Final Exam
Common Resources
menu_book Glossary settings Setup Guides folder_open Study Materials
description Release Notes expand_more
Content Changelog Platform Changelog
Get Involved
people Contributors bug_report Report an Issue open_in_new
  • What Is AI Engineering?
  • AI Cheatsheets
  • Embeddings & Vector Search

Embeddings & Vector Search#

What is an embedding?#

An embedding is a dense vector representation of text (or any modality) in a continuous high-dimensional space, such that semantically similar inputs land near each other. Modern embedding models produce vectors of 384–3072 dimensions.

Nearest-neighbor search over embeddings is the foundation of semantic search, RAG retrieval, recommendation, and deduplication.

Embedding model shortlist (2026)#

Provider

Model

Dim

Context

Notes

OpenAI

text-embedding-3-small

1536

8191

Fast, cheap default

OpenAI

text-embedding-3-large

3072

8191

Best quality

Voyage

voyage-3

1024

32000

Strong on retrieval benchmarks

Cohere

embed-v4.0

1536

128000

Multilingual, long context

Open source

BAAI/bge-m3

1024

8192

Free, multilingual, hybrid

Open source

sentence-transformers/all-MiniLM-L6-v2

384

512

Tiny, fast, weaker recall

Jina AI

jina-colbert-v2

Multi-vector

8192

Late interaction (ColBERT), 89 languages

Google

gemini-embedding-002

3072

2048

Multimodal (text+image+video+audio)

Rule of thumb: start with text-embedding-3-small. If quality is not enough, switch to text-embedding-3-large or voyage-3. Only drop to MiniLM when you need to run locally or at zero cost.

Modern embedding techniques NEW#

Matryoshka Representation Learning (MRL)#

Models trained with MRL (e.g., text-embedding-3-large, gemini-embedding-002) produce embeddings where the first N dimensions are a valid lower-dimensional embedding. Truncate from 3072 → 256 dims to cut storage ~92% with <2% quality loss.

# OpenAI: request fewer dimensions
embeddings = client.embeddings.create(
    model="text-embedding-3-large",
    input="Hello world",
    dimensions=256,  # truncated MRL embedding
)

Late interaction (ColBERT)#

Instead of a single vector per document, ColBERT produces one vector per token. Retrieval uses MaxSim: for each query token, find the max similarity across all document tokens, then sum. Much higher retrieval quality at the cost of more storage.

Multimodal embeddings#

gemini-embedding-002 (March 2026) maps text, images, video, audio, and PDFs into a single vector space. No OCR or captioning pipeline needed for document understanding.

Similarity metrics#

Metric

Formula

When to use

Cosine

dot(a,b) / (‖a‖‖b‖)

Default for text embeddings. Range: -1..1.

Dot product

dot(a,b)

When vectors are already normalized (faster).

Euclidean (L2)

‖a-b‖

Image embeddings; rarely for text.

For text, always prefer cosine unless your embedding model specifies otherwise. Most modern text embeddings are L2-normalized, making cosine and dot product equivalent.

Chunking strategy#

Embeddings operate on chunks, not whole documents. Poor chunking is the single most common cause of bad RAG.

Baseline (works most of the time):

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=120,
    separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(docs)

Advanced: semantic chunking (split on sentence-boundary similarity drops) or structure-aware chunking (split on Markdown headings, HTML sections). See Advanced Indexing.

Vector stores#

pgvector (PostgreSQL + langchain-postgres)#

Best choice when you already have Postgres. Supports rich metadata filtering via SQL.

pip install -qU langchain-postgres

Important: the package requires psycopg3. Connection string format: postgresql+psycopg://user:pass@host:5432/db (not psycopg2://).

from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

store = PGVector(
    embeddings=embeddings,
    collection_name="my_docs",
    connection="postgresql+psycopg://user:pass@localhost:5432/mydb",
)

store.add_documents(docs, ids=doc_ids)
results = store.similarity_search("query", k=5)

# Metadata filter operators: $eq, $ne, $lt, $in, $and, $or, $like, $ilike
results = store.similarity_search(
    "query",
    k=5,
    filter={"category": {"$eq": "policy"}, "year": {"$lt": 2025}},
)

Qdrant (langchain-qdrant)#

Best for dense+sparse hybrid retrieval and when you want a dedicated vector database.

pip install -qU langchain-qdrant
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

client = QdrantClient(":memory:")  # or url="http://localhost:6333"
client.create_collection(
    collection_name="docs",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

store = QdrantVectorStore(
    client=client,
    collection_name="docs",
    embedding=embeddings,
)
store.add_documents(docs, ids=uuids)

# retrieval_mode options: "dense" (default), "sparse", "hybrid"
results = store.similarity_search("query", k=5)

Chroma (langchain-chroma)#

Best for quick prototyping and local-first workflows.

pip install -qU langchain-chroma
from langchain_chroma import Chroma

store = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    persist_directory="./chroma_db",
)
results = store.similarity_search("query", k=5)

Hybrid search: dense + sparse#

Dense retrieval (vector similarity) excels at semantic understanding. Sparse retrieval (BM25) excels at exact keyword matching. Combining both via Reciprocal Rank Fusion (RRF) typically outperforms either alone — especially on queries with proper names, SKUs, or error codes.

# RRF pseudocode
def rrf(rank_lists: list[list[str]], k: int = 60) -> list[str]:
    scores: dict[str, float] = {}
    for ranks in rank_lists:
        for r, doc_id in enumerate(ranks):
            scores[doc_id] = scores.get(doc_id, 0.0) + 1.0 / (k + r + 1)
    return sorted(scores, key=scores.get, reverse=True)

See Advanced Retrieval Strategies for the full walkthrough.

Re-ranking#

After first-stage retrieval, pass the top-50 results through a cross-encoder (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2 or Cohere’s rerank-v3) to re-score them with deeper attention. This typically lifts precision by 5–15 points.

See Re-ranking for details.

Practice#

1. Embedding comparison#

Embed the following three sentences with text-embedding-3-small and compute pairwise cosine similarity:

  1. “A dog is chasing a ball in the park.”

  2. “A puppy plays fetch in the garden.”

  3. “The stock market crashed yesterday.”

Expected: (1,2) ≳ 0.7; (1,3) and (2,3) ≲ 0.3. Verify your intuition.

2. Chunking sensitivity#

Take a 20-page PDF and index it twice:

  • Run A: chunk_size=400, chunk_overlap=40

  • Run B: chunk_size=1200, chunk_overlap=180

Write 10 test questions and measure top-5 recall on both runs. Report which chunk size works better and why.

3. pgvector end-to-end#

Spin up Postgres with the pgvector extension (Docker one-liner works). Index 1,000 Wikipedia articles using text-embedding-3-small via langchain-postgres. Implement a metadata filter so that a query can be restricted to articles from a specific category. Verify both the unrestricted and filtered searches return reasonable results.

4. Hybrid search with RRF#

Build a hybrid search that combines:

  1. Dense vector search (any of the three stores above).

  2. BM25 keyword search via rank-bm25.

Fuse the two ranked lists with Reciprocal Rank Fusion (k=60). Compare the fused top-5 against the dense-only top-5 on 10 queries that include proper names or codes. Hybrid should outperform dense-only on those queries.

5. Cross-encoder re-ranking#

Retrieve top-50 from your best retriever. Re-rank with cross-encoder/ms-marco-MiniLM-L-6-v2 and keep the top-5. Measure precision@5 before and after re-ranking on your test set.

Expected: precision@5 should improve by 5–15 percentage points.

Review Questions#

  1. What similarity metric is the default choice for text embeddings?

    • A. Cosine similarity

    • B. Manhattan distance

    • C. Jaccard index

    • D. Hamming distance

  2. For most teams starting a RAG project, which embedding model is the recommended first choice?

    • A. all-MiniLM-L6-v2 (local, 384 dim)

    • B. text-embedding-3-small (OpenAI, 1536 dim)

    • C. A custom model trained from scratch

    • D. text-embedding-ada-002 (legacy)

  3. What is the pgvector connection string format required by langchain-postgres?

    • A. postgres://user:pass@host/db

    • B. postgresql+psycopg2://user:pass@host/db

    • C. postgresql+psycopg://user:pass@host/db (psycopg3)

    • D. jdbc:postgresql://host/db

  4. Why does hybrid search (dense + sparse) outperform dense-only on queries containing exact SKUs or error codes?

    • A. Sparse retrieval is faster

    • B. BM25 excels at exact keyword matching, which pure vector similarity can miss

    • C. Hybrid uses more memory

    • D. Dense vectors don’t work at all

  5. Which Qdrant parameter enables hybrid dense+sparse retrieval?

    • A. search_type="both"

    • B. retrieval_mode="hybrid"

    • C. mode="mixed"

    • D. use_bm25=True

  6. What is the primary purpose of chunk overlap when splitting documents?

    • A. To make chunks larger for free

    • B. To avoid cutting a relevant passage exactly at a chunk boundary, which would make it unretrievable

    • C. To confuse the embedding model

    • D. To save disk space

  7. A cross-encoder re-ranker is used after which step in a typical RAG pipeline?

    • A. Before the first-stage retriever runs

    • B. After first-stage retrieval, to re-score the top-N candidates with deeper attention

    • C. Instead of the embedding model

    • D. On the final answer text

  8. What is Reciprocal Rank Fusion (RRF) used for?

    • A. Training embedding models

    • B. Combining ranked lists from multiple retrievers into a single fused ranking

    • C. Compressing vectors

    • D. Chunking documents

  9. When would you prefer Chroma over pgvector for a vector store?

    • A. Never — always use pgvector

    • B. For quick prototyping and local-first workflows without an existing Postgres instance

    • C. When you need SQL metadata filters

    • D. When you have >100M vectors

  10. Most modern text embeddings are L2-normalized. What does this imply about cosine similarity vs dot product?

    • A. They produce different rankings

    • B. They produce equivalent rankings (cosine = dot product when vectors are unit length)

    • C. Dot product is always slower

    • D. Cosine is impossible to compute

View Answer Key
  1. A — Cosine is the default for text embeddings.

  2. B — text-embedding-3-small is the recommended starting point; upgrade to -large or voyage-3 if quality demands.

  3. C — langchain-postgres requires the psycopg3 driver.

  4. B — BM25 catches exact matches that dense vectors often miss.

  5. B — Qdrant’s retrieval_mode parameter with "hybrid" enables dense+sparse search.

  6. B — Overlap ensures relevant passages aren’t lost at chunk boundaries.

  7. B — Re-ranking operates on the top-N candidates from first-stage retrieval.

  8. B — RRF fuses multiple ranked lists, typically combining dense and sparse results.

  9. B — Chroma is great for local prototyping; pgvector wins when Postgres is already in your stack.

  10. B — With unit-length vectors, cosine similarity and dot product produce identical rankings; dot product is just faster.

73 of 84 in AI

previous

LLM API Cheatsheet

next

Exams

On this page
  • What is an embedding?
  • Embedding model shortlist (2026)
  • Modern embedding techniques NEW
    • Matryoshka Representation Learning (MRL)
    • Late interaction (ColBERT)
    • Multimodal embeddings
  • Similarity metrics
  • Chunking strategy
  • Vector stores
    • pgvector (PostgreSQL + langchain-postgres)
    • Qdrant (langchain-qdrant)
    • Chroma (langchain-chroma)
  • Hybrid search: dense + sparse
  • Re-ranking
  • Practice
    • 1. Embedding comparison
    • 2. Chunking sensitivity
    • 3. pgvector end-to-end
    • 4. Hybrid search with RRF
    • 5. Cross-encoder re-ranking
  • Review Questions
AI Study Roadmap AI Study Roadmap

Last updated on 16/04/2026 14:22 (UTC+7).

Copyright © 2025-2026 FSOFT.FHN.NGT AI Vanguard team.