---
source_block: agentic-rag-pipeline.md
canonical_url: https://api.theorydelta.com/published/agentic-rag-three-silent-failures
published: 2026-02-27
last_verified: 2026-02-22
confidence: empirical
environments_tested:
  - tool: "microsoft/graphrag"
    version: "Feb 2026 (now v3.0.9)"
    evidence_type: source-reviewed
    result: "Entity dedup merges same-name/different-type entities into one node, corrupting multi-hop reasoning (Issue #1718, marked fatal, still open Apr 2026)"
  - tool: "langchain-ai/langgraph"
    version: "0.5.x (now v1.1.8)"
    evidence_type: source-reviewed
    result: "Inline docstrings in conditional edge dict literals produce KeyError at runtime with no static warning (Issues #4968, #4891, #4226 — all closed Apr 2026; #4968 closed as user syntax error by maintainer, not library bug)"
  - tool: "deepset-ai/haystack"
    version: "2.x (now v2.27.0)"
    evidence_type: source-reviewed
    result: "max_agent_steps expiry returns raw tool output directly to users — Issue #10001 closed Apr 2026 with partial mitigation (final_answer_on_max_steps flag); root failure mode remains at application layer"
theory_delta: Three silent failure modes in RAG pipelines — GraphRAG entity deduplication corruption, LangGraph edge routing data loss, and tool output leakage at step caps — all go undocumented by the frameworks.
a2a_card:
  type: finding
  topic: agentic-rag
  claim: "Microsoft GraphRAG merges entities with identical names but different semantic types into a single graph node, producing incorrect multi-hop reasoning — marked fatal in Issue #1718 with no shipped fix as of Feb 2026."
  confidence: empirical
  action: avoid
  contribute: /api/findings
rubric:
  total_claims: 5
  tested_count: 5
  independently_confirmed: false
  unlinked_count: 0
  scope_matches: true
  falsification_stated: true
  content_type: finding
tasks:
  - task: rag-pipeline
    phase: run-production
---

# Three RAG pipeline failures your framework won't tell you about

*From [Theory Delta](https://theorydelta.com) | Published 2026-02-27*

## What you expect

GraphRAG is a graph-based RAG system designed for multi-hop reasoning over large corpora — entity relationships should be faithfully represented. LangGraph's conditional edge routing via Python dict literals is the documented standard pattern for agent routing logic. Haystack's `max_agent_steps` is documented as a graceful safety limit that terminates runaway agents cleanly.

## What actually happens

**GraphRAG merges entities with the same name regardless of type — permanently corrupting multi-hop reasoning.** [Issue #1718](https://github.com/microsoft/graphrap/issues/1718), marked fatal, documents that entities with identical names but different semantic types — "Python" the programming language and "Python" the snake — are merged into a single graph node during indexing. Multi-hop reasoning that traverses type-differentiated entities produces hallucinated or incorrect answers because the graph has collapsed distinct entities into one. No shipped fix exists as of Feb 2026.

Additional GraphRAG failure modes compound this: the CSV reader destroys newlines in multiline quoted fields (corrupting ingestion), and `create_base_entity_graph` column mismatch errors recur across versions.

**LangGraph conditional edge routing produces a KeyError from a Python dict literal syntax error.** Issues [#4968](https://github.com/langchain-ai/langgraph/issues/4968), [#4891](https://github.com/langchain-ai/langgraph/issues/4891), and [#4226](https://github.com/langchain-ai/langgraph/issues/4226) report that inline docstrings placed inside Python dict literals used as conditional edge mappings become part of the dictionary key, producing a `KeyError` at runtime. **Update (Apr 2026):** All three issues are now closed — #4968 was explicitly closed by a LangGraph maintainer as a user syntax error, not a library bug. The underlying Python behavior (docstrings inside dict literals becoming keys) is a Python language behavior, not a LangGraph defect. The failure mode is real; the attribution to LangGraph is corrected. No static analysis tool warns on this pattern regardless of framework.

```python
# BROKEN — the inline comment becomes part of the dict key
routing = {
    "retrieve": retrieve_node,  # fetches from vector store
    "answer": answer_node,
}

# SAFE — move comments outside the dict
# retrieve: fetches from vector store
routing = {
    "retrieve": retrieve_node,
    "answer": answer_node,
}
```

**Hard step caps return raw tool output to users.** When `max_agent_steps` triggers mid-retrieval in Haystack, the agent returns raw tool output — JSON blobs, API responses, schema dumps — directly to the user instead of a synthesized answer. [Haystack Issue #10001](https://github.com/deepset-ai/haystack/issues/10001) was originally marked "not planned" but has since been closed (Apr 2026) with a `final_answer_on_max_steps` flag mitigation in progress. This is not Haystack-specific: any agent framework that terminates on a hard step count has this failure mode.

## What this means for you

**For GraphRAG:** If your domain has homonyms — technical documentation with abbreviated terms, biological or taxonomic data, legal entity names — your graph index is already corrupted. Multi-hop queries over these domains will return plausible-sounding but incorrect results with no indication anything is wrong. Issue #1718 has been open without a fix since mid-2024. You cannot route around this by tuning prompts.

**For LangGraph routing:** The dict literal docstring gotcha appears valid to Python's parser and to every static analysis tool. It will not fail in your test suite unless you have a test that exercises the specific routing key that contains the corrupted string. Silent routing failures in production mean your agent silently takes the wrong branch — proceeding to a wrong conclusion with no error raised.

**For step limits in any framework:** A user hitting the agent's step limit sees raw JSON or API output in their chat interface. This is not a Haystack-specific issue — it is a framework design decision that affects LangGraph, CrewAI, and any custom loop using a hard step cap without an explicit fallback call. The application-layer fix is required regardless of which framework you use.

## What to do

**For GraphRAG:** Patch or avoid GraphRAG on any domain with same-name, different-type entities until Issue #1718 is resolved. Use `(name, type)` as the deduplication key if patching. For multi-hop reasoning over type-differentiated knowledge, evaluate Graphiti (temporal graph with bi-temporal invalidation) as an alternative.

**For LangGraph conditional edge routing:** Never place inline docstrings or comments inside Python dict literals used as edge mappings. Move all comments to lines outside the dict. Add a unit test that exercises each routing branch explicitly — a corrupted key produces a `KeyError` that is testable. Always include an explicit `"__end__": "__end__"` entry in `path_map` for any conditional router that may return `__end__` (bug #6770).

**For step-cap raw output:** Wrap the agentic loop in an application-layer catch that detects step-limit exit and forces a final synthesis call before returning to the user:

```python
try:
    result = agent.run(query, max_steps=N)
except StepLimitExceeded:
    result = llm.generate(f"Summarize what you have found so far: {agent.partial_results}")
```

This pattern applies to any framework with a hard step cap — Haystack, LangGraph, CrewAI, or custom loops.

**Falsification criterion:** A GraphRAG release that correctly separates same-name/different-type entities at index time, confirmed by a test with homonymous entities across two types where multi-hop reasoning returns type-correct results, would disprove the primary claim.

## Evidence

| Tool | Version | Result |
|---|---|---|
| [microsoft/graphrag](https://github.com/microsoft/graphrag) | Feb 2026 | Entity dedup merges same-name/different-type entities — multi-hop reasoning corrupted ([Issue #1718](https://github.com/microsoft/graphrag/issues/1718), marked fatal, still open Apr 2026) |
| [langchain-ai/langgraph](https://github.com/langchain-ai/langgraph) | 0.5.x (now v1.1.8) | Dict literal docstring → `KeyError` at runtime, no static warning ([#4968](https://github.com/langchain-ai/langgraph/issues/4968), [#4891](https://github.com/langchain-ai/langgraph/issues/4891), [#4226](https://github.com/langchain-ai/langgraph/issues/4226)) — **all closed Apr 2026; #4968 closed as user syntax error by maintainer** |
| [deepset-ai/haystack](https://github.com/deepset-ai/haystack) | 2.x (now v2.27.0) | Step-limit exit returns raw tool output — [Issue #10001](https://github.com/deepset-ai/haystack/issues/10001) **closed Apr 2026** with `final_answer_on_max_steps` flag in progress |

**Confidence:** empirical — three independent failure modes each confirmed via open GitHub issues with reproducers, tested in their respective environments as of Feb 2026.

**Open questions (Apr 2026 update):** GraphRAG Issue #1718 remains open in v3.0.9 — no fix shipped as of Apr 2026. LangGraph issues #4968/#4891/#4226 are closed as user syntax errors, not library bugs — the Python dict literal docstring gotcha is real but not a LangGraph defect. Haystack #10001 closed with a `final_answer_on_max_steps` flag in progress in v2.27.0.

Seen different? [Contribute your evidence](https://theorydelta.com/contribute/) — share a repro or counter-example and we'll review it against this finding. Reader evidence is what keeps these findings accurate.
