Quiz#

Context Engineering#

Question 1: What is the key distinction between prompt engineering and context engineering?

A. Prompt engineering focuses on model selection; context engineering focuses on temperature settings.
B. Prompt engineering focuses on the instruction text; context engineering treats the entire context window as an engineered artifact.
C. They are the same thing with different names.
D. Context engineering only applies to RAG systems.

Answer: B

Question 2: In a context budget for a 128k token window, why should you reserve tokens for the response?

A. The model needs tokens for its output; without a reserve, it may truncate its response.
B. Reserved tokens are used for caching.
C. The API requires a minimum response allocation.
D. Response tokens are cheaper than input tokens.

Answer: A

Question 3: What is “write context” in Karpathy’s framing?

A. Context that the model writes during generation.
B. Information authored and controlled offline — system prompts, few-shot examples, output schemas.
C. The portion of context that gets cached.
D. User messages and conversation history.

Answer: B

Question 4: What is the “lost-in-the-middle” problem?

A. LLMs forget the system prompt after 10 conversation turns.
B. Information in the middle of the context window receives less attention than information at the beginning or end.
C. Retrieved documents lose relevance after being cached.
D. Token counts are inaccurate for content in the middle of long prompts.

Answer: B

Question 5: Which truncation strategy is best for conversation history that has grown beyond its token budget?

A. Delete the system prompt to free tokens.
B. Summarize older messages while preserving recent exchanges.
C. Randomly remove messages until under budget.
D. Switch to a model with a larger context window.

Answer: B

Question 6: How should you structure a context window to maximize cache hit rate?

A. Place dynamic content (user query, tool outputs) first, followed by stable content.
B. Randomly shuffle all components on each request.
C. Place stable content (system prompt, few-shot examples) first, followed by dynamic content.
D. Send each component as a separate API call.

Answer: C

Question 7: Why does injecting irrelevant retrieved documents increase hallucination?

A. Irrelevant documents consume the model’s attention budget, and the model may treat irrelevant content as factual context.
B. Irrelevant documents cause the API to return errors.
C. The model always ignores irrelevant documents, so it has no effect.
D. Irrelevant documents only affect latency, not quality.

Answer: A

Question 8: You are building a customer support agent. The retrieved documentation, customer profile, and ticket history together exceed the context budget. What is the correct approach?

A. Remove the system prompt to make room.
B. Apply priority-based truncation: reduce the lowest-priority component first (e.g., limit ticket history to the 3 most recent tickets).
C. Send the request anyway and hope the model handles it.
D. Switch to a model with unlimited context.

Answer: B

Question 9: What is source attribution in context engineering?

A. Crediting the authors of the LLM training data.
B. Labeling each retrieved document chunk with a source identifier so the model can cite it in responses.
C. Tracking which API key was used for the request.
D. Recording the timestamp of each retrieval.

Answer: B

Question 10: A production context assembly pipeline should include a “context validation” step. What does this mean?

A. Validating the user’s API key before sending the request.
B. Logging and monitoring the actual content the model receives, to debug quality issues.
C. Running spell-check on the context.
D. Verifying the model’s response before returning it.

Answer: B

Question 11: Why is “context stuffing” (dumping all available information into the window) an anti-pattern?

A. It is too expensive to send large contexts.
B. It degrades output quality because irrelevant information dilutes the signal and the model may attend to noise instead of the relevant content.
C. LLMs reject requests that are too long.
D. It only affects latency, not quality.

Answer: B

Question 12: In progressive disclosure, how should a context engineering pipeline handle a complex user request?

A. Retrieve and inject all possible documents upfront.
B. Start with summaries of relevant topics, then fetch detailed content only for the specific aspect the user asks about.
C. Only use the system prompt and skip retrieval entirely.
D. Ask the user to simplify their question.

Answer: B