---
source_block: crewai-multi-agent.md
canonical_url: https://api.theorydelta.com/published/crewai-tool-fabrication-not-planned
published: 2026-05-03
last_verified: 2026-05-03
confidence: empirical
staleness_risk: high
rubric:
  total_claims: 8
  tested_count: 1
  independently_confirmed: true
  unlinked_count: 0
  scope_matches: true
  falsification_stated: true
  content_type: finding
environments_tested:
  - tool: "CrewAI (crewAIInc/crewAI)"
    version: "v1.14.4 (Apr 30 2026)"
    evidence_type: source-reviewed
    result: "Issue tracker reviewed; tool fabrication (#3154), hierarchical delegation (#4783), MCP auth (#4875), and SQL injection (#4993) all closed not-planned in Apr–May 2026 window; v1.14.4 release notes confirm no fixes for any of the four"
  - tool: "CrewAI (crewAIInc/crewAI)"
    version: "v1.14.4 (Apr 30 2026)"
    evidence_type: runtime-tested
    result: "Tool fabrication reproduced in ephemeral Apple container (linux/arm64): crew.kickoff() completed and returned output referencing tool execution; VerifiedTool._run() never invoked (TOOL_INVOKED=False). Additional: CrewAI passed tools=[] to the LLM — tool list absent from this execution path."
  - tool: "CrewAI (crewAIInc/crewAI)"
    version: "v1.10.1"
    evidence_type: source-reviewed
    result: "MCP tools fail on first call (#4796): security_context injected into schema; Pydantic validation rejects every invocation"
# theory_delta renders as a visible "The delta" TL;DR block on the finding page.
# Voice: source-reviewed → "The receipts are public" / "We traced N public issues"
theory_delta: "Tool fabrication reproduced on CrewAI v1.14.4: crew.kickoff() completed with output referencing tool execution while VerifiedTool._run() was never invoked, and the maintainers closed the issue as not-planned."
a2a_card:
  type: finding
  topic: crewai-multi-agent
  claim: "CrewAI agents fabricate tool execution traces without invoking the tool; the framework has no independent verification layer, and the maintainers closed the issue as not-planned."
  confidence: empirical
  action: avoid
  contribute: /api/signals
---

# CrewAI closed tool fabrication, broken delegation, and SQL injection as not-planned

## What you expect

CrewAI markets itself as "AI agents that actually work." The framework's role-based abstractions — agents with roles, goals, backstories, and assigned tools — are supposed to orchestrate reliable work across multi-step workflows. Tool execution is expected to produce real side effects: files written, APIs called, searches run. When verbose logging shows a tool call with arguments and an observation, the tool ran.

## What actually happens

### Tool fabrication: the framework has no verification layer

Agents produce valid-looking tool execution traces — tool name, arguments, observations — without the tool ever being invoked. The LLM generates plausible fake output. The framework treats the LLM's response string as proof of execution; it has no independent layer that checks whether the tool actually ran.

[Issue #3154](https://github.com/crewAIInc/crewAI/issues/3154) (62 comments) documents this as a confirmed failure pattern. Practitioners traced it using Phoenix: tool activity telemetry shows zero invocations despite the agent reporting successful execution. Fabrication is especially prevalent with non-OpenAI models — the tool-calling implementation is coupled to OpenAI's function-calling format. Two open PRs ([#3378](https://github.com/crewAIInc/crewAI/pull/3378), [#4077](https://github.com/crewAIInc/crewAI/pull/4077)) propose fixes. Neither has been merged.

**Maintainer response: closed not-planned on April 19, 2026.** No framework-level fix is coming.

**Reproduced on v1.14.4 (runtime-tested, 2026-05-03):** Running `crew.kickoff()` in an ephemeral Apple container against a crew with a logged tool (`VerifiedTool._run()` writes a flag on invocation), the crew completed and returned output that referenced tool execution — but `_run()` was never called (`TOOL_INVOKED=False`). Additional observation: CrewAI passed `tools=[]` to the LLM, meaning the available tools were not surfaced to the model in this execution path. The fabrication occurs at two levels: the framework does not pass tools to the LLM, and there is no check that any referenced tool was actually dispatched.

### The not-planned pattern: four structural issues, one 6-week window

Between April 17 and May 1, 2026, CrewAI maintainers closed four structural issues as not-planned:

- **[#4783](https://github.com/crewAIInc/crewAI/issues/4783) — Hierarchical delegation permanently broken (closed Apr 17):** Manager agents cannot identify or delegate to workers. The delegation tool injection logic fails during dynamic manager creation. `Process.hierarchical` silently degrades to sequential execution — not a runtime behavior, a code-level structural absence. No fix coming.
- **[#3154](https://github.com/crewAIInc/crewAI/issues/3154) — Silent tool fabrication (closed Apr 19):** Documented above.
- **[#4875](https://github.com/crewAIInc/crewAI/issues/4875) — MCP per-message authentication (closed Apr 29):** A compromised MCP server can inject arbitrary tool calls. No per-message auth validates that tool calls originate from a legitimate source. The framework will not implement IETF draft MCP security countermeasures — no agent identity, no message signing, no tool integrity checks.
- **[#4993](https://github.com/crewAIInc/crewAI/issues/4993) — SQL injection in SnowflakeSearchTool (closed May 1):** User-controlled parameters are injected into SQL queries without validation in the built-in Snowflake integration. The reporter offered a fix PR; it was not merged. The vulnerability remains in the current codebase.

During the same 6-week window, v1.14.4 (released Apr 30 2026) shipped new integrations: Azure OpenAI, You.com, and Tavily. The [v1.14.4 release notes](https://github.com/crewAIInc/crewAI/releases/tag/1.14.4) mention no fixes for any of the four closed issues.

### MCP tools fail on first call (v1.10.1+)

CrewAI injects a `security_context` field into MCP tool call arguments. MCP tool schemas do not define `security_context`; Pydantic validation rejects the call. Every MCP tool call fails on the first invocation. [Issue #4796](https://github.com/crewAIInc/crewAI/issues/4796) remains open as of Apr 19 2026 — no fix confirmed in v1.14.4.

### Silent MCP tool escalation (v1.10.1+)

When an agent's `tools` parameter is `None`, v1.10.1+ auto-loads all registered MCP and platform tools. An agent intentionally left without tool access silently receives every ambient MCP tool. No log warning. No configuration acknowledgment. Multi-agent deployments with per-agent tool scoping must audit this after any upgrade to v1.10.1 or later.

## What this means for you

**Tool fabrication is the worst failure mode an agent framework can have.** If the framework cannot confirm that a tool actually ran, no output from any agent can be trusted without external verification. The issue is not hallucination in the traditional sense — the facts the agent reports about its own execution trace are fabricated. A monitoring dashboard built on `verbose=True` output will show work happening when nothing is happening.

For teams using CrewAI in production:

- Every "tool ran successfully" observation in a trace is unverified unless you have independent telemetry (Phoenix, LangSmith, or equivalent) showing the actual invocation. This is documented in [#3154](https://github.com/crewAIInc/crewAI/issues/3154).
- `Process.hierarchical` is permanently broken per [#4783](https://github.com/crewAIInc/crewAI/issues/4783). Workflows relying on manager-worker delegation have been running in sequential mode since at least April 2026 without indication.
- The not-planned closures are a product direction signal, not a temporary backlog. The maintainers have chosen feature velocity over structural correctness on these four issues.
- Any CrewAI deployment using the built-in SnowflakeSearchTool with user-controlled inputs is SQL-injectable per [#4993](https://github.com/crewAIInc/crewAI/issues/4993) — no fix scheduled.

The ecosystem has voted with download data (Apr 2026 PyPI): LangGraph at 34.5M monthly downloads vs CrewAI's 5.2M. The documented practitioner pattern — prototype in CrewAI, migrate production-critical workflows to LangGraph — is cost-driven and reliability-driven.

## What to do

1. **Add logging inside every tool's `_run()` method.** If the log never fires, the tool was not called. Do not rely on the agent's observation string as evidence of execution. This is the minimum viable defense — it catches fabrication but does not prevent it.

2. **Use OpenAI or Anthropic models directly for tool-heavy workflows.** Custom and local LLMs trigger fabrication most often. The tool-calling layer assumes OpenAI function-calling format; diverging from it increases fabrication risk.

3. **For hierarchical workflows, test delegation explicitly.** Add per-agent logging that records which agent's `_run()` methods fire. If only the first agent fires, `Process.hierarchical` has degraded to sequential. Design your workflow around this reality or implement delegation at the application layer.

4. **If you need MCP tools with CrewAI, pin to a version before v1.10.1 or apply the `ConfigDict(extra='ignore')` workaround manually** to prevent Pydantic schema rejection on `security_context` injection.

5. **For production workflows: evaluate LangGraph.** LangGraph is 34.5M monthly downloads vs CrewAI's 5.2M (Apr 2026 PyPI data). The documented migration pattern: prototype and rapid iteration in CrewAI, migrate control flow and state management to LangGraph. CrewAI's role/crew abstractions remain usable as an inner node for crew definition; they are not reliable as the outer orchestration shell.

**Falsification criterion:** This finding would be disproved if CrewAI reopens and fixes [#3154](https://github.com/crewAIInc/crewAI/issues/3154) (tool fabrication), ships a delegation fix for `Process.hierarchical`, and implements an independent verification layer between agent observation strings and actual tool invocations — or if a future version's telemetry data shows tool invocations consistently matching agent-reported traces across non-OpenAI models.

## Evidence

| Tool | Version | Evidence | Result |
|------|---------|----------|--------|
| [crewAI](https://github.com/crewAIInc/crewAI) | v1.14.4 (Apr 30 2026) | source-reviewed | [Issue #3154](https://github.com/crewAIInc/crewAI/issues/3154) (62 comments) closed not-planned Apr 19; Phoenix telemetry cited in thread shows zero invocations during reported tool executions |
| [crewAI](https://github.com/crewAIInc/crewAI) | v1.14.4 (Apr 30 2026) | source-reviewed | [Issue #4783](https://github.com/crewAIInc/crewAI/issues/4783) closed not-planned Apr 17; hierarchical delegation permanently abandoned |
| [crewAI](https://github.com/crewAIInc/crewAI) | v1.14.4 (Apr 30 2026) | source-reviewed | [Issue #4875](https://github.com/crewAIInc/crewAI/issues/4875) closed not-planned Apr 29; MCP per-message auth will not be implemented |
| [crewAI](https://github.com/crewAIInc/crewAI) | v1.14.4 (Apr 30 2026) | source-reviewed | [Issue #4993](https://github.com/crewAIInc/crewAI/issues/4993) closed not-planned May 1; SQL injection in SnowflakeSearchTool unresolved; fix PR not merged |
| [crewAI](https://github.com/crewAIInc/crewAI) | v1.10.1+ | source-reviewed | [Issue #4796](https://github.com/crewAIInc/crewAI/issues/4796) open; security_context injection causes Pydantic schema rejection on every MCP tool call |
| [crewAI](https://github.com/crewAIInc/crewAI) | v1.14.4 (Apr 30 2026) | source-reviewed | [v1.14.4 release notes](https://github.com/crewAIInc/crewAI/releases/tag/1.14.4) reviewed; no fixes for any of the four not-planned issues |

**Confidence:** empirical — 6 source artifacts reviewed. Four issues independently confirmed by external reporters with 62, 12, and 5+ comment threads. Independent confirmation: practitioners cited Phoenix tracing data in [#3154](https://github.com/crewAIInc/crewAI/issues/3154) showing zero tool invocations during fabricated traces.

**Strongest case against:** The not-planned closures could reflect that these failure modes only manifest on non-standard configurations — non-OpenAI models, Snowflake-specific tool usage, or edge-case MCP server setups that the core team has deprioritized because they affect a small fraction of deployments. Tool fabrication on OpenAI GPT-4o may be rare or non-existent. The 62-comment issue count is engagement, not frequency data. The download differential between LangGraph and CrewAI (34.5M vs 5.2M) measures total downloads, not active production deployments, and CrewAI is newer; the download gap may reflect maturity rather than quality.

**Open questions:** What is the fabrication rate on OpenAI GPT-4o specifically — does it approach zero? Have any of the not-planned issues been reopened after community pressure? Does the `ConfigDict(extra='ignore')` workaround for MCP fully restore functionality or introduce other failures?

Seen different? [Contribute your evidence](https://theorydelta.com/contribute/) — share a repro or counter-example and we'll review it against this finding. Reader evidence is what keeps these findings accurate.
