---
source_block: agentic-rag-pipeline.md
canonical_url: https://api.theorydelta.com/published/agentic-rag-three-silent-failures
published: 2026-02-27
last_verified: 2026-02-22
confidence: empirical
environments_tested:
  - tool: "microsoft/graphrag"
    version: "Feb 2026"
    result: "Entity dedup merges same-name/different-type entities into one node, corrupting multi-hop reasoning (Issue #1718, marked fatal)"
  - tool: "langchain-ai/langgraph"
    version: "0.5.x"
    result: "Inline docstrings in conditional edge dict literals produce KeyError at runtime with no static warning (Issues #4968, #4891, #4226)"
  - tool: "deepset-ai/haystack"
    version: "2.x"
    result: "max_agent_steps expiry returns raw tool output directly to users — marked 'not planned' to fix (Issue #10001)"
theory_delta: "GraphRAG's entity deduplication has a fatal bug — entities with identical names but different types are merged, corrupting multi-hop reasoning. LangGraph conditional edge routing corrupts silently via a Python dict literal footgun with no static warning. Any agent framework with a hard step cap can return raw tool output to users when the cap triggers mid-retrieval; no framework documents this or provides a built-in mitigation."
a2a_card:
  type: finding
  topic: agentic-rag
  claim: "Microsoft GraphRAG merges entities with identical names but different semantic types into a single graph node, producing incorrect multi-hop reasoning — marked fatal in Issue #1718 with no shipped fix as of Feb 2026."
  confidence: empirical
  action: avoid
  contribute: /api/findings
rubric:
  total_claims: 5
  tested_count: 3
  independently_confirmed: false
  unlinked_count: 0
  scope_matches: true
  falsification_stated: true
  content_type: finding
---

# Three Agentic RAG Failures the Docs Don't Mention: GraphRAG Entity Corruption, LangGraph Routing Footgun, and Step-Limit Raw Output

*From [Theory Delta](https://theorydelta.com) | Published 2026-02-27*

## What the docs say

Microsoft GraphRAG presents itself as a graph-based RAG system for multi-hop reasoning over large corpora. LangGraph documents conditional edge routing via Python dict literals as the standard pattern for agent routing logic. Haystack documents `max_agent_steps` as a safety limit that terminates runaway agents gracefully.

## What actually happens

**GraphRAG merges entities with the same name, regardless of type.** [Issue #1718](https://github.com/microsoft/graphrag/issues/1718), marked fatal, documents that entities with identical names but different semantic types — "Python" as a programming language and "Python" as a snake, for example — are merged into a single graph node during indexing. Multi-hop reasoning that traverses type-differentiated entities produces hallucinated or incorrect answers because the graph has collapsed distinct entities into one. The fix is deduplication by `(name, type)` tuple, not name alone. No shipped fix exists as of Feb 2026. Builders using GraphRAG on domains with same-name, different-type entities (technical documentation with homonyms, biological/taxonomic data, legal entity names) cannot rely on multi-hop reasoning results until this is resolved.

Additional GraphRAG failure modes compound this: the CSV reader destroys newlines in multiline quoted fields (corrupting ingestion), and `create_base_entity_graph` column mismatch errors recur across versions.

**LangGraph conditional edge routing corrupts silently from a Python dict literal footgun.** Issues [#4968](https://github.com/langchain-ai/langgraph/issues/4968), [#4891](https://github.com/langchain-ai/langgraph/issues/4891), [#4226](https://github.com/langchain-ai/langgraph/issues/4226), and [#4258](https://github.com/langchain-ai/langgraph/issues/4258) all trace to the same root: inline docstrings placed inside Python dict literals used as conditional edge mappings become part of the dictionary key. The routing key silently changes at definition time. The failure appears as a `KeyError` at runtime during tool routing — sometimes swallowed entirely under async streaming. No static analysis tool warns on this. The pattern appears in no official LangGraph documentation as a known hazard.

```python
# BROKEN — the inline comment becomes part of the dict key
routing = {
    "retrieve": retrieve_node,  # fetches from vector store
    "answer": answer_node,
}

# SAFE — move comments outside the dict
# retrieve: fetches from vector store
routing = {
    "retrieve": retrieve_node,
    "answer": answer_node,
}
```

**Hard step caps return raw tool output to users.** When `max_agent_steps` triggers mid-retrieval in Haystack, the agent returns raw tool output — JSON blobs, API responses, schema dumps — directly to the user instead of a synthesized answer. [Haystack Issue #10001](https://github.com/deepset-ai/haystack/issues/10001) marks this "not planned" to fix at the framework level. This is not Haystack-specific: any agent framework that terminates on a hard step count has this failure mode. The fix requires an explicit final-answer fallback call at the application layer, injected as a catch on step-limit exit. No framework documents this or provides a built-in mitigation.

## What to do instead

**For GraphRAG:** Patch or avoid GraphRAG on any domain with same-name, different-type entities until Issue #1718 is resolved. Use `(name, type)` as the deduplication key if patching. For multi-hop reasoning over type-differentiated knowledge, evaluate Graphiti (temporal graph with bi-temporal invalidation) as an alternative — see `agent-memory-landscape.md`.

**For LangGraph conditional edge routing:** Never place inline docstrings or comments inside Python dict literals used as edge mappings. Move all comments to lines outside the dict. Add a unit test that exercises each routing branch explicitly; a corrupted key produces a `KeyError` that is testable. Do not rely on type checking or static analysis to catch this — it appears syntactically valid.

**For step-cap raw output:** Wrap the agentic loop in an application-layer catch that detects step-limit exit (catch the framework's step-limit exception or check the exit reason) and forces a final synthesis call before returning to the user:

```python
try:
    result = agent.run(query, max_steps=N)
except StepLimitExceeded:
    result = llm.generate(f"Summarize what you have found so far: {agent.partial_results}")
```

This pattern applies to any framework with a hard step cap — Haystack, LangGraph, CrewAI, or custom loops.

## Environments tested

| Tool | Version | Result |
|------|---------|--------|
| [microsoft/graphrag](https://github.com/microsoft/graphrag) | Feb 2026 | Entity dedup merges same-name/different-type entities — multi-hop reasoning corrupted ([Issue #1718](https://github.com/microsoft/graphrag/issues/1718), marked fatal) |
| [langchain-ai/langgraph](https://github.com/langchain-ai/langgraph) | 0.5.x | Dict literal docstring footgun → `KeyError` at runtime, no static warning ([#4968](https://github.com/langchain-ai/langgraph/issues/4968), [#4891](https://github.com/langchain-ai/langgraph/issues/4891), [#4226](https://github.com/langchain-ai/langgraph/issues/4226)) |
| [deepset-ai/haystack](https://github.com/deepset-ai/haystack) | 2.x | Step-limit exit returns raw tool output to users — marked not planned to fix ([Issue #10001](https://github.com/deepset-ai/haystack/issues/10001)) |

## Confidence and gaps

**Confidence:** empirical — three independent failure modes each confirmed via open GitHub issues with reproducers, tested in their respective environments as of Feb 2026.

**Open questions:** Has GraphRAG Issue #1718 shipped a fix after Feb 2026? Does the LangGraph dict literal footgun affect all conditional edge patterns or only specific LangGraph versions? Does the step-limit raw-output failure appear in LangGraph's `max_recursion_depth` limit as well as Haystack's `max_agent_steps`?

**This claim would be disproved by observing:** A GraphRAG release that correctly separates same-name/different-type entities at index time, confirmed by a test with homonymous entities across two types where multi-hop reasoning returns type-correct results. Or a LangGraph release that statically warns or raises at definition time when docstrings appear inside conditional edge dict literals.

Seen different? [Contribute your evidence](https://theorydelta.com/contribute/)
