I hit Claude's usage limit three times in one day last week.

Not a huge research project. Not a coding sprint. Normal work. A few Cowork sessions. Some writing. Some debugging in Claude Code. Then at 3pm, the limit screen.

I sat down and check what was actually eating my tokens. The math was uglier than I expected.

Here's the part most people miss: every message you send to Claude re-reads the full conversation before it responds. Message 1 costs a few thousand tokens. Message 30 costs roughly 100,000 tokens, because Claude re-reads the 29 messages before it even starts thinking about the new one.

Your conversation is a bank account that gets more expensive every time you dip in.

Most habits in this guide come back to that single idea: stop paying for context you aren't using.

I split this into two parts. Cowork is the chat side. Claude Code is the terminal side. Both burn tokens. Both have specific fixes.

First, the framework.

Two numbers run your life with Claude:

  1. Context window. Roughly 200K tokens per conversation. Everything counts: your messages, Claude's replies, file uploads, tool outputs, system instructions.

  2. Usage limit. How much you can send across all conversations in a 5-hour window (plan-dependent).

The mental model: context is like RAM on your computer. You wouldn't load every file on your drive into memory to edit one document. You load what's needed, release what isn't.

Most Claude users work the opposite way. One chat for a week. Same PDF dropped into five sessions. Extended Thinking on by default. Every connector enabled at all times.

All of that compounds.

Here's how to reverse it.

Part 1: Cowork

1. Pick the right model before you start.

You don't drive a semi-truck to the grocery store.

Claude has three tiers. Haiku is fast and cheap. Sonnet is the daily workhorse. Opus is the deep reasoning model, roughly 5x the cost of Sonnet.

Default rule: start on Sonnet. Escalate to Opus only when you hit something that actually needs it.

When Opus earns its cost:

→ Multi-step architectural decisions.

→ Complex financial, legal, or scientific reasoning.

→ Writing that needs real nuance (scripts, speeches, long-form essays).

When Sonnet handles it fine:

→ Drafting emails, posts, summaries.

→ Research recaps.

→ Code that isn't architecturally tricky.

→ Any single-step task.

When Haiku is the right pick:

→ File searches.

→ Simple lookups.

→ Formatting and cleanup.

→ Anything where the answer takes Claude under 30 seconds.

Model switching is one click. Make it reflex.

2. Kill PDFs before you upload them.

This is the single biggest token trap in Cowork.

When you upload a PDF, Claude doesn't just extract the text. It also rasterizes every page into an image and loads both into context. A single page can run 1,500 to 3,000 tokens depending on density.

A 15-page PDF? Roughly 30,000 to 45,000 tokens before you've typed a word.

The same text as a .md file runs closer to 2,000 tokens.

The workflow:

  1. Open the PDF. Select all the text. Copy.

  2. Paste it into a Google Doc.

  3. File → Download → Markdown (.md).

  4. Upload that instead.

Thirty seconds. Ninety percent of the token cost gone.

Screenshots follow the same logic. A raw screenshot runs around 1,300 tokens. Cropped tight to the thing that matters, under 100.

I used to drop 40-page academic papers straight into Cowork without thinking. Then I tracked my usage for a week and stopped.

3. Plan in Chat. Build in Cowork.

Cowork reads your folders at session start and writes files to disk. That's what makes it powerful. It's also what makes it expensive. Every session loads your context files, and file creation (spreadsheets, docs, presentations) spends more of your limit than a regular reply.

So stop brainstorming in the heavy tool.

Open Chat first. Work out the structure of what you want. Argue with Claude about the sections. Agree on the assumptions. Refine the outline. All of that exploration happens with minimal context loaded.

Once you know exactly what you want, open Cowork, paste the final plan, and say:

Build this exactly. No changes, no additions. Just turn this plan into the file:
[final plan]

The thinking happens in the light tool. The execution happens in the heavy tool.

4. Let Claude ask you questions.

The worst prompt is a 400-word prompt.

It feels thorough. It feels prepared. But Claude re-reads all 400 words every turn, and half of them were context you didn't need to provide upfront.

Flip the script. Write a short prompt that ends in a specific instruction:

I want to [task] to [outcome].
Read my folder.
Ask me questions using AskUserQuestion before you start.

Claude reads what's relevant, then generates a clickable form with the questions it actually needs answered. You click. Sometimes you type a short line. The form does the structuring work for you.

Compare the cost:

→ Your 400-word prompt: 400 tokens re-read every turn.

→ The AskUserQuestion version: 30-token prompt plus four clicks.

Less context loaded. Better answers, because Claude is asking for the specific things it needs instead of guessing what you meant in paragraph three.

This is the single most underused feature in Cowork. I teach it to my students in the first week of class.

5. Edit. Restart. Surgical changes only.

Three habits that stack.

Edit, don't follow up. In Chat, click the edit button on your original message instead of sending "no wait, I meant..." as a new turn. The edit replaces the old exchange. A follow-up stacks on top of it.

Restart, don't keep going. In Cowork, you can't edit a past prompt, but you can click Restart the conversation from here on any earlier message. The further back you restart, the more history you shed.

Surgical, not wholesale. When one section of a long output is wrong, don't say "redo the whole thing." A 2,000-token report regenerated is 2,000 output tokens spent again.

Say instead:

Only change section 3. Leave every other section exactly as is.

Claude regenerates just the broken part. The other 1,800 tokens stay in place.

6. Cut the preamble from Claude's output.

Claude is verbose by default. "Great question! Here's what I did..." "Let me walk you through this..." "Sure, I can help with that!" All of that is output tokens you pay for.

Add this to any prompt where you know what you want:

Just the output. No preamble, no explanation,
no recap unless I ask.

For longer workflows where Claude usually summarizes everything it did at the end:

Skip the summary. I'll read the output.

Output tokens cost as much as or more than input tokens on every model. Trimming Claude's verbosity is pure savings, and it takes two lines in a prompt.

7. Turn off what you're not using.

Web search. Connectors. Extended Thinking. Explore mode.

Each one adds tokens to every response, even when it doesn't fire. Tool definitions sit in your context whether you called the tool or not.

The setting that matters most: click the + in the lower left, hover over Connectors, then Tool access. Switch it from Tools already loaded to Load tools when needed. Claude finds the right tool when it actually needs one instead of seeing all fifteen upfront.

Extended Thinking is the other big one. It's designed for architectural decisions, tricky debugging, and multi-step logical problems. It adds reasoning tokens to every response.

For drafting, summarizing, reformatting, simple code: turn it off. The standard model handles those fine.

8. Projects, new chats, and a one-line session note.

Three context-hygiene moves that work together.

Projects for anything recurring. If you upload the same document to a new chat more than once, it belongs in a Project. Upload once. Every chat inside the Project reads it without re-tokenizing. On paid plans, Projects also use retrieval, pulling only the relevant chunks instead of the full file.

New chat per topic. One chat for a LinkedIn post. A separate chat for the client proposal. A separate one for the dinner recipe. Every new question re-reads everything above it. Mixing topics means paying for context that does nothing for the current task.

One-line session notes. Before you end a long Cowork session:

Write a session-notes.md with the key decisions,
the current state, and what we're doing next.

Next session, open a new chat and paste:

Read session-notes.md first. We're continuing from there.

You keep the direction. You ditch the bloat.

Part 2: Claude Code

Claude Code burns tokens differently. The waste comes from file reads, tool outputs, and sessions that run too long.

Here's the fix.

1. Run /context first.

Before you do anything else in a session, type /context.

It shows exactly what's in your window: system prompt, MCP tools, CLAUDE.md, auto memory, messages, free space.

The biggest surprise for most people: MCP tools. Every connected server leaves its tool definitions in context, even when idle. Run /mcp to see what's active. Disable what you don't need.

CLI tools like gh, aws, gcloud, sentry-cli don't load anything into context at all. If a command-line tool can do the job, prefer it over an MCP server.

2. Keep CLAUDE.md lean.

CLAUDE.md loads at the start of every session. Every token in it gets paid for every time you open Claude Code.

Most developers treat it like a knowledge base. Full architecture docs. Historical decisions. Redundant context Claude could infer by reading the code itself.

Waste.

What belongs in CLAUDE.md:

→ Tech stack (one line). → Code style conventions that aren't obvious from the code. → Paths to key directories. → Active TODOs and known bugs. → Compact instructions (a short section telling Claude what to preserve when /compact runs).

What doesn't belong:

→ Full API documentation. → Project history. → Architecture diagrams. → Anything Claude can read from a file when it needs it.

Anthropic's guidance: stay under 200 lines. My own rule is tighter: under 100.

One upside: CLAUDE.md gets prompt-cached automatically. After the first message in a session, it's roughly 90% cheaper to re-read. Keeping it stable (no mid-session edits) preserves the cache.

3. Use .claudeignore like .gitignore.

Claude will read and index files you never want it to touch. Build artifacts. Lock files. Generated code. Vendored dependencies.

Create a .claudeignore at the root of your project:

node_modules/
dist/
build/
target/
.next/
*.lock
*.min.js
__pycache__/
coverage/

A 5,000-line lockfile Claude decides to read is 20,000+ tokens. Multiply by every codebase exploration. The savings compound.

4. /clear vs /compact: know when to use each.

Two tools for shedding context. They behave very differently.

/clear wipes the conversation entirely. Zero messages, fresh session. Fast, free, no summary.

/compact asks Claude to summarize the conversation so far and replaces the history with that summary. You keep the trajectory of the work. Costs time and tokens to generate the summary. A 70K-token session compresses to roughly 4K.

Decision rule:

→ Switching to an unrelated task → /clear. → Mid-flow on the same work, context getting heavy → /compact. → Context poisoned with wrong assumptions Claude keeps reverting to → /clear.

Two things most people miss on /compact:

First, pass it a focus instruction. Generic compacts keep too much fluff. Targeted compacts keep only what matters:

/compact focus on the auth refactor, drop the test debugging

Second, run it at 60% context, not 95%. When the window is nearly full, Claude is already in a degraded state, and the summary it produces reflects that.

5. Delegate research to subagents.

If a task requires reading 15+ files to understand something, don't do it in your main conversation. Those files will sit in your context for the rest of the session.

Use a subagent:

Use subagents to investigate how our authentication system
handles token refresh. Report back with a summary of what
you found, the file paths involved, and any utilities I
could reuse.

The subagent runs in a separate context window. It reads the files, figures out the answer, and returns a summary. Your main conversation stays clean.

The mental test: will I need the raw output again, or just the conclusion? If only the conclusion, it's a subagent task.

I use subagents most often for codebase exploration and for verifying work after Claude implements a feature. Both are high-file-read, low-context-retention tasks.

6. opusplan: plan with Opus, execute with Sonnet.

The advisor pattern. Start Claude Code with:

claude --model opusplan

Opus runs the planning phase, where deep reasoning pays off. Sonnet runs execution, which is where most of the tokens get spent.

For a complex refactor, this alone cuts costs 2-5x. Planning is a small fraction of total work, and Sonnet is five times cheaper than Opus.

For smaller tasks: default to Sonnet. Only escalate when Sonnet gets stuck.

For custom subagents: set the model field to sonnet or haiku in the agent's definition file. Haiku is 3x cheaper than Sonnet and handles file searches, formatting, and simple lookups at near-frontier quality.

7. Be specific. Use Skills for the rest.

Two things to combine.

Be specific. "Fix the authentication bug" makes Claude explore your entire codebase looking for auth code. Every read costs tokens. "Fix the off-by-one bug in utils.py line 12 that causes the last item to be skipped" tells Claude exactly where to go.

Every time you know the file, use it. Every time you know the error, paste it. Every time you know the line number, include it.

Specificity saves tokens and time. A rare win-win.

Skills for repeatable workflows. The difference between CLAUDE.md and a Skill:

→ CLAUDE.md loads every session. → Skills load only when invoked.

If a workflow applies to some sessions but not all (a code-review pattern, a specific refactor approach, a testing format), don't put it in CLAUDE.md. Run /skill-creator. Build a skill. The skill's description sits in context (a few tokens). The full instructions only load when it fires.

This keeps CLAUDE.md tight and pushes specialized workflows into pay-per-use mode.

The decision tree.

If you take nothing else from this guide:

In Cowork:

→ Convert PDFs and screenshots before uploading. → Plan in Chat. Build in Cowork. → Keep prompts short. End with "ask me questions using AskUserQuestion." → Edit original messages. Redo specific sections, never the whole output. → Turn off Extended Thinking by default.

In Claude Code:

→ Run /context before blaming Claude for being slow. → Keep CLAUDE.md under 100 lines. → /clear for new tasks. /compact (with a focus instruction) for continuity. → Delegate research to subagents. → Use opusplan for complex refactors.

The theme, on both sides: your context window is a resource. Treat it like one.

Load what you need. Release what you don't. Repeat.

Every token you save is a message you didn't have to skip at 3pm.

If this changes how you use Claude, send it to one person on your team still dropping PDFs raw into every chat. That's the favor I'm asking.

Keep Reading