---
source_block: mcp-tool-schema-failure-modes.md
canonical_url: https://api.theorydelta.com/published/mcp-tool-schema-cross-provider-failures
published: 2026-04-27
last_verified: 2026-04-25
confidence: empirical
staleness_risk: high
rubric:
  total_claims: 10
  tested_count: 9
  independently_confirmed: true
  unlinked_count: 4
  scope_matches: true
  falsification_stated: true
  content_type: finding
environments_tested:
  - tool: "modelcontextprotocol/typescript-sdk (Anthropic)"
    version: "1.22.0 (regression); 1.19.1 (last-good pin)"
    evidence_type: source-reviewed
    result: "$ref in inputSchema crashes AJV validator on tools/list; AWS MCP servers forced to pin SDK"
  - tool: "grafana/mcp-grafana (Grafana Labs)"
    version: "review date 2026-02-01"
    evidence_type: source-reviewed
    result: "Bare boolean true in schema causes Fireworks AI HTTP 500 across all 95 tools"
  - tool: "github/github-mcp-server (GitHub)"
    version: "review date 2026-03-01"
    evidence_type: source-reviewed
    result: "anyOf + sibling fields blocks all Gemini requests; missing additionalProperties:false breaks all OpenAI calls"
  - tool: "Microsoft Research MCP-Universe (Microsoft)"
    version: "2026-Q1 analysis"
    evidence_type: independently-confirmed
    result: "775 name collisions ecosystem-wide; 85% accuracy degradation with large tool surfaces"
  - tool: "anomalyco/opencode (Anomalyco)"
    version: "review date 2026-03-01"
    evidence_type: source-reviewed
    result: "anyOf schema from official GitHub MCP server blocks all Gemini requests"
theory_delta: "MCP tool schemas imply cross-provider portability via JSON Schema, but 10 distinct structural failure classes cause provider-specific rejections or silent LLM misfires — JSON Schema is not a safe interchange format without a per-provider adapter layer."
a2a_card:
  type: finding
  topic: mcp-tool-schema-failure-modes
  claim: MCP tool schemas using valid JSON Schema constructs silently fail against specific LLM providers — bare booleans crash Fireworks AI, $ref triggers string serialization in Claude and Kiro, and anyOf+siblings blocks all Gemini requests
  confidence: empirical
  action: avoid
  contribute: /api/signals
---

# MCP tool schemas written for Claude silently fail against Fireworks AI, Gemini, and OpenAI

## What you expect

JSON Schema is a well-specified interchange format. MCP uses it as the `inputSchema` for tool definitions. A tool schema that validates locally and works in one client should work in all clients.

The MCP spec defines a common schema format but no conformance suite, no provider compatibility guidance, and no schema sanitization layer. The implicit promise is portability.

## What actually happens

A review of 10+ public issues across major MCP server repositories. The same valid JSON Schema construct that works with Claude fails with a different provider in a distinct, hard-to-trace way. There is no cross-provider compatibility matrix. Failures surface only in production.

**Bare boolean schemas crash Fireworks AI.** `"field": true` is valid JSON Schema (meaning "accept any value") but Fireworks AI returns HTTP 500 with no indication which tool caused it. In grafana/mcp-grafana#594, isolating the offending tool required a binary search across 25 tools. The Go `interface{}` type silently emits this pattern.

**`$ref` causes LLM string serialization.** When `inputSchema` uses `$ref` pointers, models treat referenced parameters as untyped and serialize objects as JSON-encoded strings, returning `MCP error -32602: Invalid arguments`. This was confirmed in typescript-sdk#1562, Claude Code #18260, and Kiro CLI independently — the same failure class in three separate tools. TypeScript SDK PR #1460 widened the blast radius by emitting `$ref` on all Zod-registered types.

**`$ref` crashed the TypeScript SDK validator on tools/list.** SDK 1.22.0 introduced AJV-based output schema caching that fails on `$defs` — valid JSON Schema. AWS MCP servers were forced to pin to 1.19.1 (typescript-sdk#1175).

**`anyOf` + sibling fields blocks all Gemini requests.** Gemini validates ALL tool schemas before processing any request. Gemini requires `anyOf` to be the sole field in a schema object — any sibling field (`type`, `description`, `items`) blocks every request to the provider. The official `@modelcontextprotocol/server-github` triggers this via the `comments` field in `github_create_pull_request_review` (opencode#14509). Workaround: disable the GitHub MCP server.

**Missing `additionalProperties:false` breaks OpenAI strict mode.** OpenAI strict function calling requires `additionalProperties: false` on all object schemas. The Go SDK (`mcp-go`) does not add it by default. github-mcp-server#376 confirms all OpenAI calls return 400 Bad Request.

**Tool name collisions silently overwrite.** Microsoft Research found 775 name collisions ecosystem-wide — "search" appears in 32 distinct MCP servers. The ToolRegistry uses a last-registered-wins policy with no warning.

**Wide tool surfaces degrade accuracy by up to 85%.** [Microsoft Research](https://www.microsoft.com/en-us/research/blog/tool-space-interference-in-the-mcp-era-designing-for-agent-compatibility-at-scale/) measured up to 85% performance degradation with large tool spaces and up to 91% accuracy degradation with long tool responses. One MCP tool averaged 557,766 tokens per call — exceeding GPT-5's 272K input limit. [arXiv:2604.21003](https://arxiv.org/abs/2604.21003) found that loading a full MCP catalog upfront imposes a 10–60k input-token tax per turn even when most tools are never called.

## What this means for you

A tool schema that passes local testing with Claude is not safe to deploy against Fireworks AI, Gemini, or OpenAI without a per-provider sanitization pass. The failure modes are provider-specific, silent, and surfaced only in production. Error messages do not identify the offending tool or field.

If you run a multi-provider MCP server, every LLM upgrade or provider addition is a latent schema compatibility test with no safety net. One non-conforming schema in a 40-tool server disables all Gemini access (anyOf), causes sporadic HTTP 500s from Fireworks ([mcp-grafana#594](https://github.com/grafana/mcp-grafana/issues/594)), or makes every OpenAI call return 400 ([github-mcp-server#376](https://github.com/github/github-mcp-server/issues/376)).

The name collision problem means multi-server setups are non-deterministic: which "search" runs depends on config file ordering, not your intent.

## What to do

**Provider-targeted schema sanitization (highest leverage).** Apply a sanitization pass per-provider before forwarding schemas:

1. Convert bare boolean schemas: `true` → `{"type": "object"}`, `false` → `{"not": {}}`
2. Inline `$ref` pointers: dereference all local `#/$defs/X` into inline schemas before serialization (~95 lines, no external dependencies)
3. For Gemini: strip sibling fields when `anyOf` is present
4. For OpenAI strict: add `additionalProperties: false` to all object schemas
5. Apply nullable-aware null stripping (check `ZodNullable` before removing null values)

The gateway layer is the correct place to apply per-provider schema sanitization — do it once, at the boundary.

**Tool surface management:**

1. Cap tool count: Cursor's 40-tool limit is empirically motivated, not arbitrary
2. Use lazy schema loading (dynamic tool gating) — defer full schema injection until a tool is selected; [arXiv:2604.21003](https://arxiv.org/abs/2604.21003) shows a 10–60k per-turn token reduction in large-catalog agents
3. Add `compact=true` / `detail_level` parameters on high-output tools
4. Use server-enforced flow hints in descriptions to reduce wrong-path selection

**Operational hygiene:**

1. Commit a JSON snapshot of `tools/list` output as a contract file — fail CI on any PR that changes tool schemas without updating the contract. This catches drift before merge, including parallel-PR conflicts.
2. CI step: diff `tools/list` from real server vs. mock manifest on every deploy
3. Inventory tool names across all configured servers before adding a new one; use server-prefixed names proactively: `{server}_{tool}`

**Falsification criterion:** This finding would be disproved by MCP publishing a normative JSON Schema compatibility matrix confirmed to eliminate all listed failure classes, with a conformance test suite that validates schemas across all major LLM providers before server publication.

## Evidence

| Tool | Version | Evidence | Result |
|------|---------|----------|--------|
| [grafana/mcp-grafana](https://github.com/grafana/mcp-grafana/issues/594) | review 2026-02 | source-reviewed | Bare boolean `true` in schema → Fireworks AI HTTP 500 across all tools; isolated via binary search |
| [modelcontextprotocol/typescript-sdk](https://github.com/modelcontextprotocol/typescript-sdk/issues/1562) | 1.22.0 / 1.19.1 | source-reviewed | $ref → LLM string serialization (error -32602); AJV crash on tools/list forced AWS to pin 1.19.1 |
| [anomalyco/opencode](https://github.com/anomalyco/opencode/issues/14509) | review 2026-03 | source-reviewed | anyOf + sibling fields in official GitHub MCP server blocks all Gemini requests |
| [github/github-mcp-server](https://github.com/github/github-mcp-server/issues/376) | review 2026-03 | source-reviewed | Missing additionalProperties:false → all OpenAI calls return 400 Bad Request |
| [Microsoft Research — Tool-space interference](https://www.microsoft.com/en-us/research/blog/tool-space-interference-in-the-mcp-era-designing-for-agent-compatibility-at-scale/) | Q1 2026 | independently-confirmed | 775 name collisions; 85% accuracy degradation with large tool surfaces; 557,766 avg tokens/tool call |
| [arXiv:2604.21003](https://arxiv.org/abs/2604.21003) | 2026-04 | independently-confirmed | Eager schema injection imposes 10–60k token tax per turn; lazy schema loading as mitigation |

**Confidence:** empirical — 6 environments reviewed, multiple independent confirmations.

**Strongest case against:** These failure modes may reflect early-ecosystem growing pains rather than permanent architectural constraints. As MCP matures, providers may converge on a common JSON Schema subset enforced by SDK validators, eliminating the need for per-provider adaptation. The MCP spec's open proposals ([issue #1990](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1990) — conformance testsuite, [SEP #1814](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/1814) — caniuse-style compatibility matrix) could produce a conformance test suite that prevents new failures from entering the ecosystem. Tool surface degradation is also fixable via dynamic tool loading, which several frameworks are actively implementing.

**Open questions:** Does a universal "safe subset" of JSON Schema exist that all current providers accept without sanitization? Will [SEP-2145](https://github.com/modelcontextprotocol/modelcontextprotocol/pull/2145) (error reporting standardization) make failures traceable before they reach production? Does the [CyberArk Full-Schema Poisoning](https://www.cyberark.com/resources/threat-research-blog/poison-everywhere-no-output-from-your-mcp-server-is-safe) finding (attack payloads in parameter names, defaults, enums) require schema sanitization to also serve as a security layer?

Seen different? [Contribute your evidence](https://theorydelta.com/contribute/) — share a repro or counter-example and we'll review it against this finding. Reader evidence is what keeps these findings accurate.
