The prompts that worked in 2023 are different from what works in 2026. Models have changed, capabilities have shifted, and the research on what actually produces better outputs has matured. This guide covers what reliably works — tested across GPT-4o, Claude 3.5 Sonnet, Gemini 2.0, and 10+ other models.

Why Prompt Engineering Still Matters

With more capable models, you might expect prompting to matter less. The opposite is true. More capable models are more responsive to prompt structure — they can follow complex instructions precisely, adapt their style to explicit direction, and reason through multi-step problems when guided correctly. The ceiling on what good prompting can achieve has risen dramatically.

The basics — be specific, give context, specify format — haven't changed. But the advanced techniques have evolved considerably, and some popular advice from 2023 (like "act as an expert in X") has lost most of its effectiveness as models have been trained to resist role-playing patterns.

Core Techniques That Reliably Work

1. Specify the Output Format Explicitly

The single highest-leverage change most people can make: tell the model exactly what format you want. Don't hint at it — state it.

Weak prompt: "Summarize this article."

Strong prompt: "Summarize this article in exactly 3 bullet points, each under 20 words, focusing only on the practical implications for software developers."

Format instructions work for: bullet points, numbered lists, tables, word count limits, heading structure, code blocks, markdown, HTML, JSON, and custom structures. Models follow explicit format instructions far more reliably than they follow implied ones.

2. Chain-of-Thought for Complex Reasoning

For any task involving reasoning, calculation, or multi-step logic, instruct the model to show its work before giving a final answer. This dramatically reduces errors because the model catches mistakes in its own reasoning chain before committing to a conclusion.

How to apply it: Add "Think through this step by step before giving your final answer" or "Show your reasoning" to any prompt that involves logic or calculation.

Why it works: Models generate text one token at a time. When they write out intermediate reasoning steps, each step becomes part of the context for the next — which means the reasoning quality of subsequent steps improves. Forcing visible chain-of-thought surfaces this benefit.

When to use it: Math problems, logical puzzles, code debugging, multi-step analysis, anything where "just give me the answer" could lead to confident but wrong outputs.

3. Few-Shot Examples

Showing the model what good output looks like — with 2–5 examples — is one of the most reliable ways to constrain style, format, and quality. This works especially well for:

Writing in a specific voice or tone
Generating structured data in a specific schema
Classification tasks
Transformations (rewriting, reformatting)

Example structure:

Here are examples of the output I want:

Input: [example 1 input]
Output: [example 1 output]

Input: [example 2 input]
Output: [example 2 output]

Now do the same for:
Input: [your actual input]

The more your examples match the style and quality you want, the better the output will be. This technique essentially "shows" the model what you mean rather than trying to describe it.

4. Role Assignment (Done Right)

The classic "act as a [role]" prompt has been diluted — models are increasingly trained to respond helpfully regardless of role framing. But role assignment still works when you use it to set context and perspective, not just to unlock capabilities.

Less effective: "Act as an expert software engineer."

More effective: "You're reviewing this code as part of a security audit for a financial services company. Focus on authentication flows, data exposure risks, and input validation. Flag anything that would fail a SOC 2 Type II audit."

The difference: specific context and constraints, not just a label. The role should change what the model attends to, not just what it claims to be.

5. Explicit Constraints and Anti-Patterns

Tell the model what to avoid, not just what to do. This is underused and highly effective.

Examples of constraints that improve output quality:

"Don't use filler phrases like 'certainly', 'of course', or 'great question'"
"Don't add a conclusion paragraph that summarizes what you just said"
"Don't hedge every statement — state your view directly"
"Don't use passive voice"
"Don't explain what you're about to do — just do it"

These constraints address the most common AI writing patterns that make output feel generic. Combine 3–5 of them to significantly change the character of AI-generated text.

6. Decompose Complex Tasks

Large, complex tasks benefit from being broken into sequential steps — either within one prompt or across multiple prompts. A prompt asking for a 2,000-word article, three code snippets, a comparison table, and an executive summary all at once tends to produce mediocre outputs across all of them.

Instead: generate the outline first, then expand each section, then refine. Each step benefits from the previous output as context. This is slower but produces dramatically better results for complex work.

7. Specify Your Audience

The same content should be written very differently for a technical expert versus a non-technical reader. Models default to a generic middle ground that's often too simple for experts and too complex for beginners. Explicitly specify your audience.

Examples: "Explain this to a senior TypeScript developer who's never used Rust." or "Write this for a CEO with no technical background who needs to make a procurement decision."

Techniques That Work Less Than They Used To

"Act as a [role] with no restrictions"

Jailbreak-adjacent prompts that try to override model safety behavior are increasingly ineffective as models are specifically trained against them. More importantly, they're the wrong strategy — better to frame your legitimate request clearly rather than trying to circumvent the model's judgment.

"You are DAN" and similar persona overrides

These worked briefly when models were less trained on adversarial prompts. They're now largely ineffective for anything substantive.

Excessive flattery

"You are the world's greatest expert in..." doesn't reliably improve output quality. What does work is specific context about what expertise is relevant: "Focus on aspects a forensic accountant would notice."

Model-Specific Differences

Some techniques work differently across models:

Technique	GPT-4o	Claude 3.5 Sonnet	Gemini 2.0 Pro
Format instructions	Excellent compliance	Excellent compliance	Good, occasionally drifts
Chain-of-thought	Strong improvement	Strong improvement	Moderate improvement
Constraint lists	Very effective	Very effective	Moderately effective
Few-shot examples	Strong	Strong	Strong
Role assignment	Moderate effect	Moderate effect	Moderate effect

The Meta-Skill: Iterative Refinement

The best prompt engineers don't write perfect prompts on the first try — they iterate. The workflow:

Write an initial prompt and evaluate the output
Identify the specific failure: wrong format, wrong tone, too long, missed a key point, incorrect reasoning?
Add a constraint or instruction targeting that specific failure
Re-run and evaluate again

This is much more effective than trying to write a perfect comprehensive prompt from scratch. Each iteration teaches you something about how the model interprets your intent.

Prompting Across Multiple Models

One underused technique: run the same prompt through multiple models and compare. This reveals which techniques produce consistent improvement across models (strong signals) versus those that only work on one model (weaker signals). When Claude, GPT-4o, and Gemini all respond better to a modified prompt, you've found a genuinely effective technique — not just a quirk of one model.

Deepest makes this comparison easy — you can run the same prompt across all three simultaneously and iterate on the prompt once to improve all responses at once.

Frequently Asked Questions

What is the most important prompt engineering technique?

Explicitly specifying the output format. It's the single change that most reliably improves output quality across all models. Be specific: word count, structure, what to include and exclude.

Does prompt engineering matter less with newer models?

No — newer models are more responsive to good prompts, not less. The ceiling on what good prompting can achieve has risen with each model generation.

How long should a prompt be?

As long as it needs to be to specify exactly what you want. Many effective prompts are 2–5 sentences. Complex tasks benefit from longer prompts with examples and explicit constraints. Padding and repetition don't help.

What's the difference between a system prompt and a user prompt?

The system prompt sets persistent context and instructions for the entire conversation. The user prompt is your specific request. For applications and APIs, use system prompts for standing instructions (tone, format, role). For one-off tasks, everything can go in the user prompt.

Prompt Engineering Fundamentals: A Practical Guide for 2025

Why Prompt Engineering Still Matters

Core Techniques That Reliably Work

1. Specify the Output Format Explicitly

2. Chain-of-Thought for Complex Reasoning

3. Few-Shot Examples

4. Role Assignment (Done Right)

5. Explicit Constraints and Anti-Patterns

6. Decompose Complex Tasks

7. Specify Your Audience

Techniques That Work Less Than They Used To

"Act as a [role] with no restrictions"

"You are DAN" and similar persona overrides

Excessive flattery

Model-Specific Differences

The Meta-Skill: Iterative Refinement

Prompting Across Multiple Models

Frequently Asked Questions

What is the most important prompt engineering technique?

Does prompt engineering matter less with newer models?

How long should a prompt be?

What's the difference between a system prompt and a user prompt?

See it for yourself

Related articles

LLM Aggregators Explained: What They Are and Why They Matter

How to Use AI for Research and Writing Without Losing Your Voice

Best AI Models for Coding in 2025: Ranked by Real Tasks