Cut Claude Code Costs

Claude Code is a powerful coding tool, but its token usage can quickly add up. By implementing three simple tricks, users can significantly reduce their token usage without compromising on performance. These tricks include using the Opus and Sonnet models efficiently, utilizing subagents for research and exploration, and installing the Caveman plugin. By combining these methods, users can extend their token usage limits and get more out of their Claude Code plan.

Rashida F (AI-assisted) May 5, 2026 2 min read EN

Claude Code is the most powerful coding tool available, but its ease of use can also lead to rapid token usage. However, most users hit their usage limits due to inefficient use rather than the tool being expensive. By making three small changes, users can get 3-5x more out of their existing plan.

Optimizing Model Usage

Claude Code has multiple models, including Opus 4.6 and Sonnet 4.6. Opus 4.6 is the most intelligent model, while Sonnet 4.6 is faster and more affordable. By default, Claude Code uses Opus for all tasks, which can lead to excessive token usage. To optimize model usage, users can run the command /model opus-plan in their Claude Code session. This command ensures that Opus only handles planning tasks, while Sonnet handles execution tasks, resulting in roughly 5x cheaper token usage for heavy lifting.

Using Subagents

Every message sent to Claude Code requires the tool to re-read the entire chat history, leading to bloated context and increased token usage. Subagents can be used to mitigate this issue. A subagent runs in its own context window, allowing users to send it to perform heavy reading tasks, such as exploring codebases or researching libraries, without affecting the main chat. To use a subagent, users can simply ask Claude Code to do so, and the tool will automatically spin up a subagent. This approach ensures that users only pay for the summary provided by the subagent, rather than the entire context.

Installing the Caveman Plugin

The Caveman plugin is a Claude Code plugin that makes the tool respond in a concise, caveman-like language, reducing token usage by up to 65%. The plugin can be installed by telling Claude Code to /install caveman. Once installed, users can activate the plugin by typing /caveman and selecting their desired level of conciseness: lite, full, or ultra.

Combining the Tricks

By combining these three tricks, users can significantly extend their token usage limits and get more out of their Claude Code plan. The Opus and Sonnet models can be used efficiently, subagents can be utilized for research and exploration, and the Caveman plugin can be installed to reduce token usage. By implementing these methods, users can maximize their token usage without sacrificing performance.

Conclusion

Claude Code is a powerful tool, but its token usage can quickly add up. By implementing the three tricks outlined above, users can significantly reduce their token usage and get more out of their existing plan. Whether you're a seasoned developer or just starting out, these tricks can help you maximize your Claude Code usage and achieve your coding goals.

More articles like this

AI 1 min

A blueprint for using AI to strengthen democracy

A seismic shift in information flows is underway, as AI-driven technologies begin to redefine the boundaries of civic engagement and representation. By harnessing the power of distributed networks and decentralized data architectures, a new generation of digital tools is poised to amplify marginalized voices and hold institutions accountable. This quiet revolution in democratic infrastructure is being driven by the convergence of blockchain, edge computing, and AI-driven content moderation. AI-assisted, human-reviewed.

AI 4 min

Claude Code: The Terminal-Based AI That Runs Your Business While You Sleep

Most Claude users never leave the browser tab. A smaller group has moved to Claude Code, a terminal-based interface that unlocks plugins, scheduled agents, MCPs, and project-aware files. This guide walks through installation, the four modes, slash commands, managed agents, skills, MCPs, and the two files that run an entire business. All for the same $20/month Pro plan.

AI 3 min

Vercel’s Agent-Browser Replaces Playwright for AI Agents—93% Fewer Tokens

Playwright was designed for human-written tests, not AI agents, leading to slow, expensive workflows that dump full-page screenshots into context windows. Vercel’s agent-browser solves this by feeding models compact accessibility trees instead of pixels, reducing token usage by 93% and accelerating execution. The tool is already a GitHub favorite, with over 31,000 stars, and integrates seamlessly with AI coding assistants like Claude Code.

AI 3 min

Higgsfield MCP Server: Turn Claude Into a Short-Form Ad Factory in 2 Minutes

Higgsfield, a visual generation platform that wraps models like Seedance 2.0, Sora 2, Veo 3.1, Kling 3.0, and Hailuo 02 behind a single interface, shipped an MCP server on April 30, 2026. This lets Claude Desktop users generate short-form ads by simply chatting — no clicking around the Higgsfield UI. Nine curated presets (UGC, unboxing, product review, hyper motion, TV spot, and more) ship out of the box. The workflow collapses creative production from days to minutes, making it realistic for brands to ship the 30+ ad variants per month that Meta's algorithm rewards.

AI 2 min

OpenAI and PwC collaborate to reimagine the office of the CFO

OpenAI’s quiet alliance with PwC arms CFOs with autonomous agents capable of parsing GAAP filings, reconciling ERP ledgers, and triggering real-time audit flags—effectively outsourcing the "last mile" of financial close to transformer-based workflows. The deal signals a shift from point automation to full-stack orchestration, with PwC’s 6,000-strong AI task force embedding OpenAI’s Operator API into enterprise-grade control planes. AI-assisted, human-reviewed.

AI 2 min

DeepClaude Lets You Run Claude Code With DeepSeek's Brain for 17x Cheaper - Decrypt

A new cloud-based service, DeepClaude, slashes costs for running OpenAI's Claude large language model by leveraging the massively parallel architecture of DeepSeek's Brain, a custom-designed ASIC, to achieve a 17-fold reduction in computational expenses, making high-performance LLM inference accessible to a broader range of developers and enterprises. This breakthrough is poised to accelerate AI adoption across industries. The service's efficiency is attributed to its ability to optimize Claude's neural network for DeepSeek's Brain's unique hardware capabilities. AI-assisted, human-reviewed.