Prompt Engineering Quick Reference#
Core principles#
Be specific. Vague asks (“summarize this”) yield vague answers. State the audience, length, format, and success criteria.
Show, don’t tell. A single well-chosen example is worth a paragraph of instructions.
Separate concerns. Use XML-style tags or Markdown headings to isolate instructions, context, and examples.
Constrain the output. Ask for JSON / a specific schema / a fixed structure whenever downstream code will parse the response.
Let the model think. For any non-trivial reasoning task, allow intermediate reasoning before the final answer.
Role assignment#
Open with a system message describing the persona and scope. This is the single highest-leverage prompt technique.
You are a senior Python code reviewer at a fintech company.
Your job is to flag security issues (OWASP Top 10), then style problems.
Only comment on things that would block a merge. Be terse.
Few-shot examples#
Provide 2–5 input→output pairs that cover the edge cases you care about. The model will generalize from the pattern.
Convert each product description to a JSON tag list.
Example 1:
Input: "Wireless noise-cancelling over-ear headphones."
Output: ["wireless", "noise-cancelling", "over-ear", "headphones"]
Example 2:
Input: "Stainless steel insulated water bottle, 500ml."
Output: ["stainless-steel", "insulated", "water-bottle", "500ml"]
Now:
Input: "{user_input}"
Output:
Chain-of-thought (CoT)#
Ask the model to reason step-by-step before answering. On Claude 4.6 and other reasoning models, this is often implicit via extended/adaptive thinking, but the prompt still helps for older or cheaper models.
Think through the problem carefully, then give the final answer.
Problem: A train leaves station A at 3pm going 60 mph. Another leaves
station B at 4pm going 80 mph toward A. Stations are 280 miles apart.
When do they meet?
Reasoning:
For models that expose reasoning tokens natively (Claude Opus 4.6,
Sonnet 4.6 with adaptive thinking; OpenAI o-series; Gemini 3 with
ThinkingConfig), you usually do not need to add “think step by step”
manually — the model handles it internally.
Structured output#
Always prefer schema-constrained output over regex-parsing free text.
With the Anthropic SDK#
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=[{
"name": "extract_contact",
"description": "Extract structured contact info",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"},
},
"required": ["name", "email"],
},
"strict": True, # guarantees schema conformance
}],
tool_choice={"type": "tool", "name": "extract_contact"},
messages=[{"role": "user", "content": "Jane Doe, jane@acme.co, 555-0100"}],
)
With LangChain create_agent#
from pydantic import BaseModel, Field
from langchain.agents import create_agent
class ContactInfo(BaseModel):
name: str = Field(description="Full name")
email: str = Field(description="Email address")
phone: str | None = Field(default=None, description="Phone if present")
agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[],
response_format=ContactInfo,
)
result = agent.invoke({"messages": [{"role": "user", "content": "..."}]})
print(result["structured_response"]) # ContactInfo instance
XML-style tagging (Anthropic-preferred)#
Claude is trained to respect XML-ish tags in prompts. Use them to isolate sections that the model should treat as data rather than instructions.
Review the pull request below and return a list of blocking issues.
<diff>
{pull_request_diff}
</diff>
<style_guide>
{internal_style_guide}
</style_guide>
Respond with a JSON array of {"line": int, "severity": str, "message": str}.
Negative instructions — use sparingly#
LLMs can interpret negative instructions (“don’t do X”) as emphasis on X. Rephrase as positive directives whenever possible.
❌ “Don’t use passive voice.”
✅ “Write every sentence in active voice.”
Grounding and anti-hallucination#
When the model must cite sources, be explicit:
Answer only using information contained in <context>. If the answer is
not in the context, reply exactly: "I don't know based on the provided
sources." Cite the source ID in square brackets after each claim.
<context>
[src-1] ...
[src-2] ...
</context>
Question: {question}
Temperature and sampling#
temperature=0.0— deterministic, best for extraction, classification, code generation.temperature=0.7— balanced, good for most conversational tasks.temperature=1.0— Gemini 3 default and recommended; unusual to change.
For OpenAI Responses API and Anthropic messages with thinking enabled, temperature effects are muted — the model already has internal variance.
Prompt caching (long static context)#
If you send the same 50KB system prompt on every request, enable prompt caching so you pay the cache-read rate (typically 0.1× input) instead of the full input rate.
client.messages.create(
model="claude-sonnet-4-6",
system=[
{"type": "text", "text": "You are..."},
{
"type": "text",
"text": long_style_guide_text,
"cache_control": {"type": "ephemeral"},
},
],
messages=[...],
)
See Observability: LangFuse & LangSmith for tracing these patterns in production.
Practice#
1. Rewrite a vague prompt#
Take the prompt: "summarize this article".
Rewrite it with:
An explicit audience (“a busy executive who has 30 seconds”)
A target length (“3 bullet points, each under 15 words”)
A format (“plain text, no Markdown”)
A success criterion (“must name the two parties and the dollar amount”)
Run both prompts through the same model and compare the outputs.
2. Few-shot classification#
Build a prompt that classifies customer support tickets into one of
billing, bug, feature-request, other. Provide 4 examples in the
prompt. Test with 10 held-out tickets and measure accuracy.
Target: ≥90% accuracy on a balanced test set using
claude-sonnet-4-6 at temperature=0.
3. Structured extraction with strict tool use#
Use the Anthropic SDK with strict: true to extract the following from
a job posting:
{
"title": str,
"company": str,
"location": str,
"remote": bool,
"salary_min": int | None,
"salary_max": int | None,
"required_skills": list[str],
}
Verify the response parses as valid JSON on 20 real postings without
any try/except fallbacks.
4. Prompt caching measurement#
Set up a long system prompt (~2,000 tokens). Issue 10 identical user
queries. Compare total cost with and without cache_control using the
usage.cache_creation_input_tokens and usage.cache_read_input_tokens
fields from the response.
Expected: after the first request, subsequent requests should report most input tokens as cache reads, cutting cost by ~90% on the cached portion.
5. Grounding and refusal#
Build a RAG-style prompt that must answer only from a provided
<context> block. Test with 5 questions that are answerable from the
context and 5 that are not. The model should refuse exactly on the 5
unanswerable ones with the literal string "I don't know based on the provided sources.".
Measure refusal precision and recall.
Review Questions#
Which prompt technique consistently produces the largest quality improvement with the smallest token budget?
A. Setting temperature to 1.0
B. Assigning a clear role in the system message
C. Using ALL CAPS for important instructions
D. Adding the word “please” to every instruction
You need to extract a strict JSON schema from unstructured text. Which option is most reliable?
A. Ask the model to “return JSON” in the prompt and parse it yourself
B. Use tool use with
strict: true(Anthropic) orresponse_format(OpenAI / LangChain)C. Regex the output
D. Set temperature to 2.0 for variety
For Claude Opus 4.6 with adaptive thinking enabled, do you still need to write “think step by step” in the prompt?
A. Yes, always
B. No — the model reasons internally; an explicit CoT instruction is usually redundant
C. Only on weekends
D. Only for math problems
What is the recommended temperature for Gemini 3 models?
A. 0.0
B. 0.5
C. 1.0 (default)
D. 2.0
Why is XML-style tagging particularly effective with Claude?
A. Claude was trained to respect XML-ish tags for isolating instructions from data
B. XML is faster to parse than JSON
C. It reduces token count
D. It’s required by the API
Your system prompt is 50KB and never changes between requests. How do you cut cost?
A. Truncate it to 5KB
B. Enable prompt caching with
cache_control: {"type": "ephemeral"}C. Send it only on the first request
D. Use a smaller model
What is the failure mode of negative instructions like “Don’t use passive voice”?
A. The model throws an error
B. LLMs sometimes interpret the negation as emphasis and do the opposite — prefer positive directives
C. They cost more tokens
D. They are case-sensitive
A RAG system should refuse to answer when the context is insufficient. How do you enforce this?
A. Lower the temperature
B. Explicitly instruct the model to reply with a fixed refusal string when the answer is not in the context
C. Hope it works
D. Use a larger model
When building a few-shot classifier, how many examples are typically enough?
A. Exactly 1
B. 2–5 examples that cover the edge cases
C. At least 100
D. As many as the context window allows
Which parameter makes tool use schema-conforming on the Anthropic API?
A.
temperature: 0B.
strict: trueon the tool definitionC.
json_mode: trueD.
force_schema: true
View Answer Key
B — Role assignment is the single highest-leverage technique.
B — Schema-constrained tool use beats regex parsing every time.
B — Reasoning models handle CoT internally; an explicit “think step by step” is redundant (and sometimes harmful).
C — Google explicitly recommends keeping
temperature=1.0default for Gemini 3 models.A — Claude is trained to recognize XML-ish tags as structural boundaries.
B — Prompt caching cuts cached reads to ~0.1× input cost.
B — Rephrase as positive directives (“Write in active voice”).
B — Explicit refusal instructions with a fixed string make refusals detectable and measurable.
B — 2–5 carefully chosen examples typically get you most of the benefit.
B —
strict: trueon the tool guarantees the model’s output matches the declared schema.