BeginnerClaude Code

Fast mode: when speed matters more than depth

/fast trades a little reasoning for a lot of latency. Here's when that's the right trade.

Fast mode runs Claude Opus 4.6 with optimized output speed. Same intelligence ceiling for short tasks; the difference shows up at the long-task scale. The rules for when to flip the switch.

6 min read
claude-codeperformanceworkflowfast-mode

Claude Code has a /fast toggle. It switches the active model to Claude Opus 4.6 with output speed prioritized — same model family Anthropic used for sub-second iterative coding workflows. For some tasks this is a clear win. For others it's a quiet downgrade.

What /fast actually changes

  • Token output speed: roughly 2x faster end-to-end.
  • Reasoning depth: similar to non-fast for short tasks, but you'll feel the difference on long-horizon planning.
  • Total cost per task: usually lower, because shorter wall-clock = fewer tokens spent on filler / re-explaining.
  • It only works on Opus 4.6 — toggling it doesn't move you to a smaller model. (Common misconception.)

When to flip it on

Editing tasks where you already know what you want. Renaming a variable across files. Adding a single new endpoint that mirrors an existing one. Writing test cases for a function you've already specced. The output is short, the reasoning is shallow, the win is latency.

Tight feedback loops. You're iterating on UI copy, prompt wording, or a regex. You're going to run Claude 20 times in 10 minutes. Cumulative latency adds up; fast mode pays for itself in the first three iterations.

Tool-heavy workflows where the bottleneck is human review. When Claude calls a Bash command, your time-to-next-decision is gated by Claude's response speed, not its thinking depth. Fast mode shortens that gap.

When to leave it off

New design work. When you're sketching how to architect a system, you want Claude reasoning hard. Fast mode shaves milliseconds off the wrong decisions.

Debugging hard bugs. "Why does this race condition only happen on staging?" — give Claude the slow lane and the bigger reasoning budget.

Long planning sessions. /plan mode benefits from depth. Fast mode is fine for executing the plan, but write the plan in non-fast mode.

Reviews that gate ship decisions. /review and /security-review should be slow and careful, not fast and shallow.

A simple rule

If you'd accept a junior engineer doing this task in 10 seconds, use fast mode. If you'd want a senior engineer to spend 5 minutes thinking, don't.

You can toggle mid-session. Common pattern: /plan (off), approve plan, /fast, do the work, /fast (off), /review. Each phase gets the right depth.

What it isn't

Fast mode is not "quality reduced." It's "output prioritized for speed at the same intelligence tier." The difference shows up:

  • On long, branching tasks where depth-first thinking matters.
  • On ambiguous prompts where the right answer requires re-reading context multiple times.
  • On synthesis ("based on these 8 files, what's the pattern?") more than execution ("apply this rename to those 8 files").

Don't use it as a default; pick it deliberately when the task fits.

Where to go next

  • See 10 Claude Code slash commands/fast is one of them. /plan, /review pair naturally with non-fast.
  • For batch-style work where you want fast outputs at scale, look into the Claude API Batch endpoint — same speed/cost tradeoff at the API tier.
  • Pair fast mode with prompt caching for tool-heavy agent loops where the same context repeats.

Keep learning

Apply this with Waymaker

Get this article surfaced where you work

Inside Waymaker, this article shows up next to the right Signal page — so the lesson lands when you need it, not before.

No credit card required.