← All posts

The Tool Tax: How Claude Code, Cursor, Copilot, and Codex Handle 200 MCP Tools

The four major coding agents converged on the same architectural pattern for dynamic MCP tool discovery — and shipped four incompatible implementations. One of them still hasn't shipped at all. Here's what that costs in tokens, latency, and which workloads each agent can actually run.

The Bottom Line

It is now normal for a developer to wire 5–15 MCP servers into a coding agent, exposing 100–300 tools. Loading every tool schema upfront on every turn imposes a steep tool tax: 50–140K tokens of input, slower startup, degraded selection accuracy past ~30–50 visible tools, and — for some providers — hard rejection at 128 tools.

Claude Code, Cursor, OpenAI Codex CLI, and GitHub Copilot have all converged on the same architectural shape (defer schemas; load on demand) but ship four different implementations. The pattern is becoming industry standard. The interfaces are not interoperable. And the defaults differ enough to change which workloads each agent can actually handle without manual intervention.

Agent Mechanism Default Upfront cost at 200 tools
Claude CodeAPI-native tool_search_toolOn~3–5K tokens
CursorFiles synced to disk + grepOn~2–4K tokens
GitHub Copilot"Virtual tools" embedding clustersThreshold-gated~5–10K, or fail at 128
Codex CLINone (manual allowlists only)Eager~50K tokens / turn

Why The Tool Tax Hurts

Each MCP tool schema (name, description, JSON-Schema parameters) costs roughly 250–700 tokens depending on verbosity. Measured in the wild:

  • mcp-omnisearch (20 tools): ~14,114 tokens — ~706 tokens/tool
  • playwright (21 tools): ~13,647 tokens — ~650 tokens/tool
  • Representative 10-server, 200-tool catalog: 50,000–140,000 tokens

Three failure modes compound at scale:

  1. Context displacement. On a 200K-token window, eager loading consumes 25–70% of context before the agent reads a single file.
  2. Selection accuracy collapse. Models lose tool-selection accuracy past ~30–50 visible tools. With 200, they reliably pick the wrong tool or miss the right one.
  3. Hard API limits. GPT-4.1 and current Copilot agent mode enforce a 128-tool ceiling. Requests above are rejected before execution.

The fix is structurally the same everywhere: keep a tiny persistent surface (a search tool, a file index, cluster summaries) and expand full schemas on demand. How each agent implements that is where it gets interesting.

Claude Code: API-Native Tool Search

Anthropic ships tool_search_tool_regex_20251119 and tool_search_tool_bm25_20251119 as first-class API features. In Claude Code, MCP tools are deferred and discovered on demand by default; the agent sees only the search tool plus any tools the user pinned. When the agent needs a capability, it issues a regex or BM25 query, the API returns 3–5 matching tool_reference blocks, and those expand into full schemas only at the moment of use.

Limits worth knowing:

  • Catalog ceiling: 10,000 tools.
  • Tool descriptions and server instructions truncated at 2KB each.
  • Disabled by default on Vertex AI and on non-first-party proxy hosts (the tool_search beta header is not forwarded).
  • Requires Sonnet 4+ or Opus 4+; Haiku models do not support tool search.

For 200 tools, this collapses the upfront tool cost from ~50K to ~3–5K — a >90% reduction. Anthropic's published figure is >85% across typical multi-server setups.

Cursor: The File System Is The Index

Cursor explicitly rejected the dedicated-search-tool approach. Instead, it syncs MCP tool descriptions to a folder on disk. The agent receives only tool names in static context and discovers full schemas by grep-ing or semantic-searching the synced folder when a task calls for it.

This is part of a broader "everything is a file" strategy: Cursor applies the same primitive to Agent Skills, long terminal output, and oversized tool responses. The file surface unlocks capabilities the API-level approach can't — notably, surfacing MCP server status (re-authentication needed, server unhealthy) without forgetting the tools entirely.

A/B test results published in January 2026 reported a 46.9% reduction in total agent tokens on runs that called an MCP tool. Schema-only reduction is higher; the 46.9% is across all session content.

GitHub Copilot: Virtual Tools

Copilot took a third path: embedding-guided clustering. The November 2025 blog post calls the system "virtual tools" — functionally similar tools are grouped using an internal embedding model and cosine similarity, with each cluster summarized by a single model call (cached locally). The agent sees the cluster summaries and expands a cluster only when its task matches.

In published benchmarks (SWE-Lancer, SWEbench-Verified), this approach improved success rates by 2–5 percentage points with both GPT-5 and Sonnet 4.5, and cut response latency by ~400ms.

Two important constraints:

  • 128-tool hard cap. Even with virtual tools enabled, VS Code 1.109 still enforces the 128-tool API limit at request time. Multi-MCP setups exceeding 128 are rejected before the agent can act, regardless of clustering. Open issue as of January 2026.
  • Threshold-gated. Virtual tools activate when github.copilot.chat.virtualTools.threshold is crossed; below threshold, tools are still loaded eagerly.

OpenAI Codex CLI: No Dynamic Loading

Codex CLI exposes MCP tools alongside built-ins on session start, with no deferral mechanism. The OpenAI API supports tool_search with defer_loading: true on GPT-5.4+, but Codex CLI does not wire it up to MCP tools as of the current release.

Available controls are static:

  • enabled_tools = [...] and disabled_tools = [...] allow/deny lists per server
  • enabled = false to disable a server without removing config
  • Project-scoped .codex/config.toml for narrower scoping

Known scaling issues with many servers:

  • MCP startup and tool discovery sit on the first-turn critical path. One slow or unhealthy server can stall the session until timeout.
  • Sub-agents inherit the full parent tool set with no scoping. On GPT-4.1 (128-tool API limit), spawning sub-agents in any environment with 150+ tools fails outright. Workarounds require disabling servers globally.

For 200 tools, Codex CLI pays the full ~50K-token cost on every turn. Across a 10-turn session, that's ~500K tokens of pure tool-definition overhead.

Token Economics At 200 Tools

The following figures assume a representative MCP catalog with ~250 tokens per tool schema (conservative; verbose servers run 2–3× higher).

Agent Upfront Per-turn 10-turn overhead
Claude Code (tool_search on)3–5K3–5K + ~500 per expansion30–50K
Cursor (DCD on)2–4K2–4K + ~400 per file read20–40K
GitHub Copilot (virtual tools, <128)5–10K5–10K + cluster expansions50–100K
Codex CLI (no deferral)50K50K500K

On a 200K-token window, Codex CLI consumes 25% of context before any work starts; with verbose servers, this rises to 70%. The other three agents leave essentially all of context available for the actual task. Over a multi-turn session, Codex's per-turn cost compounds linearly — what costs $X in the first turn costs ~$10X across ten turns.

Practical Guidance

For multi-MCP workloads (5+ servers, 100+ tools), agent choice materially affects what's possible:

  1. Verify defaults are active. Claude Code: confirm ENABLE_TOOL_SEARCH is not disabled (it silently turns off on proxies and Vertex AI). Copilot: set github.copilot.chat.virtualTools.threshold low enough to activate. Cursor: dynamic context discovery is on by default — no action needed.
  2. For Codex CLI: scope aggressively with project-level .codex/config.toml and enabled_tools allowlists. Treat global MCP config as a superset; only enable per-project what each session actually needs. Expect 30–50 active tools max for healthy operation.
  3. For Copilot at 128+ tools: the cap is enforced at the VS Code layer before the agent processes anything. Virtual tools clustering helps the model reason but does not bypass the API limit. Trim selected tools manually, or split work across sessions.
  4. MCP server design: prefix tool names by server (matrix_query_*, quality_metric_*) so search and clustering have clean signal. Keep tool descriptions concise — Claude Code truncates anything past 2KB silently.

Where The Standard Is Heading

All four implementations solve the same problem and converge on the same architectural shape: a small persistent surface (search tool, file index, cluster summaries) plus on-demand schema expansion. None of the wire formats interoperate. MCP itself ships notifications/tools/list_changed for dynamic catalogs but no standardized search/defer primitive that clients and servers agree on.

The pattern is industry-standard. The interface is not. Anthropic's tool_search_tool_*, OpenAI's tool_search + defer_loading, Cursor's file sync, and Copilot's virtual tools each remain vendor-specific. Until MCP adds a discovery primitive, each agent's behavior at scale will diverge in the ways documented here.

The Draft Angle

Dynamic tool discovery solves the inbound tool-tax problem — how the agent navigates a huge external catalog. It does not solve the outbound problem: how the agent navigates your codebase. That's what Draft's local knowledge graph is for. Deterministic answers to "what calls this function" and "what does this change touch" don't belong in an MCP server at all; they belong in draft/graph/*.jsonl, committed alongside the code. Different problem, same lesson: defer everything, expand on demand, keep the persistent surface small.

Sources

  • Anthropic: tool search tool documentation, Claude Code MCP docs
  • Cursor blog: "Dynamic context discovery" (January 2026)
  • GitHub blog: "How we're making GitHub Copilot smarter with fewer tools" (November 2025)
  • OpenAI Developers: Codex CLI MCP reference, tool_search API guide
  • Real-world MCP tool-cost measurements: Scott Spence, October 2025
  • VS Code issue #290356 (128-tool cap, January 2026)
  • GitHub Copilot CLI issue #2992 (sub-agent tool scoping, April 2026)
  • Codex CLI issue #21318 (MCP startup blocking, May 2026)