Assignment: Context Engineering#

Assignment Metadata#

Field	Description
Assignment Name	Context Engineering for Production RAG
Course	AI Integration
Project Name	`context-engineering-lab`
Estimated Time	120 minutes
Framework	Python 3.11+, LangChain 1.x, OpenAI API, tiktoken

Learning Objectives#

By completing this assignment, you will be able to:

Design a token budget for a multi-component context window
Implement a context assembly pipeline with retrieval and tool outputs
Build a conversation history manager with summarization-based truncation
Optimize context layout for caching and grounding
Measure how context allocation affects output quality

Problem Description#

You are building a technical support agent that needs to assemble context from multiple sources — retrieved documentation, customer profiles, ticket history, and conversation history — within a fixed token budget. Your task is to implement the context assembly pipeline and measure how different budget allocations affect response quality.

Technical Requirements#

Environment Setup#

Python 3.11 or higher
Required packages:
- langchain[openai] >= 1.0
- tiktoken >= 0.7.0

Dataset#

Prepare a test set with:

At least 10 support questions spanning different categories
A mock documentation corpus (at least 20 chunks)
Mock customer profiles and ticket histories

Tasks#

Task 1: Token Budget Design (20 points)#

Define a token budget for a 128k context window:
- Allocate tokens across: system prompt, few-shot examples, retrieved docs, tool outputs, conversation history, and response reserve
- Justify your allocation decisions
Implement a budget tracker that:
- Counts tokens per component using tiktoken
- Warns when a component exceeds its allocation
- Reports total utilization as a percentage
Test your budget with at least 3 scenarios: minimal context, typical context, and maximum context

Task 2: Dynamic Context Assembly (25 points)#

Build an assembly pipeline that:
- Retrieves relevant documentation chunks (top-5 by relevance)
- Fetches customer profile data
- Injects recent ticket history
- Combines all components into a single prompt
Implement relevance filtering that:
- Scores retrieved documents by relevance
- Drops documents below a similarity threshold
- Re-ranks remaining documents (most relevant first and last)
Add source attribution so the model can cite [Source N] in its responses

Task 3: Conversation History Management (25 points)#

Implement a history manager that:
- Tracks token usage for conversation history
- Triggers summarization when history exceeds its budget
- Always preserves the system message and last 3 exchanges
Compare two truncation strategies:
- Simple drop: remove oldest messages
- Summarize and compress: use an LLM to summarize old messages
Measure the impact on response coherence across a 10-turn conversation

Task 4: Caching and Grounding (30 points)#

Restructure your context for cache efficiency:
- Identify which components are stable vs. dynamic
- Place stable components first in the context
- Measure cache hit rate across 10 sequential requests
Implement grounding checks:
- Add instructions that require source citation
- Verify that responses reference provided context
- Measure faithfulness (% of claims supported by context)
Run an experiment varying retrieved context budget from 2,000 to 20,000 tokens:
- Measure response quality (LLM-as-judge, 1–5 scale)
- Measure faithfulness score
- Plot the results and identify the optimal allocation

Evaluation Criteria#

Criteria	Points
Token budget design & implementation	20
Dynamic context assembly	25
Conversation history management	25
Caching and grounding experiments	30
Total	100

Hints#

Tip

Use tiktoken.encoding_for_model("gpt-4o") for accurate token counting
Start with a simple budget and iterate — don’t over-optimize on the first pass
For the grounding experiment, use a simple LLM-as-judge prompt to score faithfulness
The lost-in-the-middle effect is real — test with most-relevant-first vs. random ordering
Log the actual tokens per component for each request so you can debug budget violations