Skip to main content
Ctrl+K
AI Study Roadmap - Home AI Study Roadmap - Home Organization logo Organization logo

AI Study Roadmap

Ctrl+K
  • GitLab
AI Study Roadmap - Home AI Study Roadmap - Home

AI Study Roadmap

Ctrl+K
  • GitLab
Table of Contents
dashboard Overview
smart_toy What Is AI Engineering? expand_more
AI Training Overview
Foundations expand_more
Introduction to AI & Generative AI
Introduction to RAG and Theoretical Foundations
LangChain Framework and Core Components
Modern RAG Architecture
Practice — AI Foundations chevron_right
Basic AI Fundamentals Slides Basic AI Fundamentals - Final Assignment
Assessment — Foundations chevron_right
Quiz and Summary
RAG Optimization expand_more
Advanced Indexing
Advanced Retrieval Strategies
Query Transformations
Re-ranking
GraphRAG Implementation
Multimodal RAG NEW
Practice — RAG Optimization chevron_right
Assignment: Hybrid Search Assignment: Post-Retrieval Processing Assignment: Query Transformation Assignment: Advanced Indexing Assignment: GraphRAG Implementation Assignment: Tool Calling & Tavily Search Integration
Assessment — RAG Optimization chevron_right
Quiz Quiz Quiz Quiz Quiz Quiz for LangGraph and Agentic AI module
Agents expand_more
LangGraph Foundations & State Management
Tool Calling & Tavily Search
Multi-Expert Research Agent with ReAct Pattern
Multi-Agent Collaboration
Human-in-the-Loop & Persistence
Model Context Protocol (MCP) NEW
Agent Memory Systems NEW
Context Engineering
Harness Engineering
Practice — Agents chevron_right
Assignment: LangGraph Foundations & State Management Assignment: Multi-Expert ReAct Research Agent Assignment: Human-in-the-Loop & Persistence Assignment: FPT Customer Chatbot - Multi-Agent System Assignment: Harness Engineering Assignment: Context Engineering
Assessment — Agents chevron_right
Quiz for LangGraph and Agentic AI module Quiz for LangGraph and Agentic AI module Quiz for LangGraph and Agentic AI module Quiz for LangGraph and Agentic AI module Quiz Quiz Quiz & Appendix - Advanced
LLMOps expand_more
Evaluation Toolkit - Ragas
Observability: LangFuse & LangSmith
Experiment Comparison: Naive, Graph, Hybrid
AI Safety & Guardrails NEW
Building RAG Agent using LangChain
Practice — LLMOps chevron_right
Assignment: RAGAS Evaluation Metrics Assignment: RAG Architecture Experiment Comparison Assignment: LLM Observability with LangFuse & LangSmith
Assessment — LLMOps chevron_right
Quiz
AI Cheatsheets expand_more
Prompt Engineering Quick Reference
LLM API Cheatsheet
Embeddings & Vector Search
Exams expand_more
AI Theory Exams
AI Project Exams
Basic AI Fundamentals Quiz
Exam Theory: RAG and Optimization
Final Exam: Enterprise RAG System
Final Exam
Final Project Exam: FPT Customer Chatbot - Multi-Agent AI System
LLMOps and Evaluation Question Bank
Final Exam: Production-Ready RAG Evaluation System
Introduction
code What Is Software Engineering? expand_more
Software Engineering Training Program
Foundations expand_more
Web Concepts
Threads vs Processes
What asyncio is ?
greenlet
File Descriptors
Event Loop
CPython vs Pure Python
Practice — Foundations chevron_right
Write your first API apps 📘 Assignment: Day 1 – Foundation and First Steps with FastAPI Concurrency Code Exploration fork_sample_call_fn.py socket_server_sample.py ✔️ Preserves Liskov Substitution Principle
Assessment chevron_right
Foundations Assessment Quiz
API Development expand_more
FastAPI Intro
What ASGI Is
What are Path Parameters in FastAPI?
FastAPI Code Examples for Query Parameters
Header Parameters in FastAPI
Body in FastAPI
Data Modeling (Pydantic)
Server-Sent Events (SSE)
Dependency Injection
Practice — API Development chevron_right
Try to run FastAPI in Jupyter Notebook Async & Event Loop Exploration Assignment: Building a Product Inventory API with FastAPI and Pydantic main-memory-fastAPI.py Function Type vs Method Type Problem set the PYTHONPATH when debugging with F5 in Visual Studio Code (VS Code) Diagram 1. selectors.py and its design
Assessment — API Development chevron_right
Applied Assessment Quiz
Data Persistence expand_more
Install required packages
Async Python, Postgres, and SQLAlchemy
Alembic Introduction
CRUD Application Overview
Practice — Data Persistence chevron_right
Step-by-step: Add Connection in pgAdmin ~~Install Docker CE (On Mac)~~ Database Setup Reference Becareful 1. Core Concept Exercise: Setup PostgreSQL on Docker CE (not Desktop) Exercise: FastAPI + Async SQLAlchemy + Basic TODO CRUD Exercise: End-to-End Solution Exercise: Migrations (Alembic), Concurrency & Transactions Exercise: Pagination, Filtering, Validation & Error Handling env.py
Assessment — Data Persistence chevron_right
Integration Assessment Quiz
Security & Testing expand_more
What is JWT ?
What is OAuth 2.0
Google OAuth2 Authentication
Authentication Implementation Overview
Unit Testing FastAPI Applications
Practice — Advanced chevron_right
FastAPI Advanced Topics: Authentication, Authorization, and Testing FastAPI Advanced Topics: Authentication, Authorization, and Testing Assignment: Working With JWT in Python OAuth2 Python Assignment Solutions for JWT Assignment FULL SOLUTION — OAuth2 in Pure Python (GitHub Authorization Code Flow) Assignment 1 — Build a Full Test Suite for a FastAPI App JWT Implementation Reference init.py Pytest Reference Implementation
Assessment chevron_right
Advanced Assessment Quiz
Software Engineering Cheatsheets expand_more
Git Collaboration Workflow
Relational Databases
API Mastery: REST & Security
Testing Methodologies & TDD
CI/CD Pipelines
Docker Fundamentals
Secure Coding Practices
Code Review Practices
Container Orchestration with Docker Compose
Clean Architecture & Layering
Microservices vs Serverless
Exams expand_more
Final Project Exam: FPT Customer Chatbot - Backend API System
Introduction to Software Engineering
Software Engineering Appendix expand_more
Frontend Practice — Chatbot Backend API
cloud What Is Cloud & Infrastructure? expand_more
Cloud & Infrastructure Training Program
Foundations expand_more
Introduction to Cloud
Docker Fundamentals & Best Practices
Monolith vs. Microservices: Principles, Pros & Cons
Practice chevron_right
Assignments Containerization with Docker Assignment
Assessment chevron_right
Containerization with Docker - Quiz
Applied expand_more
Basic AWS Services Essential
CI/CD Automation Pipelines
Continuous Code Quality with SonarQube
Practice chevron_right
CI/CD and Deployment Assignment Continuous Code Quality with SonarQube Assignment
Assessment chevron_right
CI/CD and Deployment - Quiz Continuous Code Quality with SonarQube - Quiz
Integration expand_more
Implementing API Gateway
Message Queues with RabbitMQ
Practice
Assessment
Advanced expand_more
SAGA Pattern Concepts
Performance — Redis Caching
Observability
Review & E2E Debugging
Practice chevron_right
Assignments
Assessment
Cloud & Infrastructure Cheatsheets expand_more
AWS Core Services Quick Reference
Kubernetes Quick Reference
CI/CD Patterns Quick Reference
Managed ML Services Cheatsheet
Exams expand_more
Basic DevOps Essentials for Developer - Theory Exam
Project Exam
Quiz
Final Exam: Deploy FastAPI Application to AWS Cloud
Final Exam
Common Resources
menu_book Glossary settings Setup Guides folder_open Study Materials
description Release Notes expand_more
Content Changelog Platform Changelog
Get Involved
people Contributors bug_report Report an Issue open_in_new
  • What Is AI Engineering?
  • RAG Optimization
  • Practice — RAG Optimization
  • Assignment: Hybrid Search

Assignment: Hybrid Search#

Assignment Metadata#

Field

Description

Assignment Name

Hybrid Search with BM25 and Reciprocal Rank Fusion

Course

RAG and Optimization

Project Name

hybrid-search-rag

Estimated Time

90 minutes

Framework

Python 3.11+, LangChain 1.x, rank-bm25, Sentence-Transformers, ChromaDB


Learning Objectives#

By completing this assignment, you will be able to:

  • Implement BM25 keyword search alongside vector-based semantic search

  • Apply Reciprocal Rank Fusion (RRF) to merge results from multiple retrievers

  • Compare the effectiveness of Vector Search, BM25, and Hybrid Search

  • Configure the fusion parameters to optimize retrieval quality

  • Analyze scenarios where Hybrid Search outperforms single-method approaches


Problem Description#

Your RAG system currently relies solely on Vector Search for retrieval. While this works well for semantic queries, users report poor results when searching for:

  • Specific error codes (e.g., “Error 503 Service Unavailable”)

  • Product SKUs and model numbers

  • Technical terms and acronyms

  • Proper names and exact phrases

Your task is to implement a Hybrid Search system that combines BM25 keyword matching with Vector Search, using RRF to merge the results.


Technical Requirements#

Environment Setup#

  • Python 3.11 or higher

  • Required packages:

    • langchain >= 1.0

    • rank-bm25 >= 0.2.2

    • sentence-transformers >= 2.2.0

    • chromadb >= 0.4.0

    • nltk >= 3.8.0 (for tokenization)

Dataset#

Prepare a dataset that includes documents with:

  • Technical specifications with codes/numbers

  • Natural language descriptions

  • Mixed content (code snippets, prose, tables)

  • At least 100 documents for meaningful comparison


Tasks#

Task 1: Implement BM25 Retriever (25 points)#

  1. Build a BM25 retriever that:

    • Tokenizes documents properly (handle punctuation, case normalization)

    • Indexes all documents in your corpus

    • Returns top-K documents with BM25 scores

  2. Test with keyword-heavy queries:

    • Create at least 5 queries containing specific codes, numbers, or technical terms

    • Verify that BM25 correctly retrieves documents with exact keyword matches

Task 2: Implement Hybrid Search with RRF (35 points)#

  1. Create a Hybrid Retriever that:

    • Executes both BM25 and Vector Search in parallel

    • Implements RRF score calculation: RRF(d) = Σ 1/(k + rank(d))

    • Uses configurable k constant (default: 60)

    • Returns merged and re-ranked results

  2. Handle edge cases:

    • Documents appearing in only one result list

    • Ties in RRF scores

    • Empty results from one retriever

Task 3: Comparative Evaluation (40 points)#

  1. Create a test set with 20 queries categorized as:

    • Keyword queries (5): Exact matches, codes, identifiers

    • Semantic queries (5): Conceptual questions, synonyms

    • Hybrid queries (10): Mix of keywords and semantic intent

  2. Evaluate each retrieval method (Vector, BM25, Hybrid):

    • Precision@5: Proportion of relevant documents in top 5

    • Recall@10: Proportion of all relevant documents retrieved in top 10

    • Mean Reciprocal Rank (MRR): Average of 1/rank of first relevant result

  3. Create a comparison table showing:

Query Type

Method

Precision@5

Recall@10

MRR

Keyword

Vector

Keyword

BM25

Keyword

Hybrid

Semantic

Vector

Semantic

BM25

Semantic

Hybrid

Hybrid

Vector

Hybrid

BM25

Hybrid

Hybrid


Submission Requirements#

Required Deliverables#

  • Source code (Jupyter notebook or Python scripts)

  • README.md with setup and usage instructions

  • Evaluation results table (as shown above)

  • Analysis document explaining when each method excels

  • Screenshots showing example queries and retrieved documents

Submission Checklist#

  • BM25 retriever correctly matches keywords

  • RRF fusion produces valid merged rankings

  • Evaluation covers all three query types

  • Code is well-documented with comments

  • Analysis includes specific examples


Evaluation Criteria#

Criteria

Points

BM25 implementation correctness

15

Tokenization and preprocessing

10

RRF implementation accuracy

25

Hybrid retriever edge case handling

10

Evaluation methodology

15

Comparative analysis quality

15

Code quality and documentation

10

Total

100


Hints#

  • The rank-bm25 library provides easy BM25 implementation

  • Use nltk.word_tokenize() for consistent tokenization

  • Test RRF with small examples first to verify your formula

  • Consider using the companion notebook 02-hybrid-search-rag.ipynb as reference

  • For the evaluation, manually label at least the top 10 results per query as relevant/not relevant

20 of 84 in AI

previous

Practice — RAG Optimization

next

Assignment: Post-Retrieval Processing

On this page
  • Assignment Metadata
  • Learning Objectives
  • Problem Description
  • Technical Requirements
    • Environment Setup
    • Dataset
  • Tasks
    • Task 1: Implement BM25 Retriever (25 points)
    • Task 2: Implement Hybrid Search with RRF (35 points)
    • Task 3: Comparative Evaluation (40 points)
  • Submission Requirements
    • Required Deliverables
    • Submission Checklist
  • Evaluation Criteria
  • Hints
AI Study Roadmap AI Study Roadmap

Last updated on 16/04/2026 14:22 (UTC+7).

Copyright © 2025-2026 FSOFT.FHN.NGT AI Vanguard team.