---
source_block: agentic-rag-pipeline.md
canonical_url: https://api.theorydelta.com/published/langgraph-checkpoint-serialization-silent-loss
published: 2026-03-29
last_verified: 2026-03-01
confidence: empirical
evidence_type: independently-confirmed
environments_tested:
  - tool: "LangGraph (langchain-ai)"
    version: "v1.0.10"
    evidence_type: source-reviewed
    result: "JsonPlusSerializer replaces deserialized values with None on failure — no exception raised (bug #6970, open)"
  - tool: "LangGraph (langchain-ai)"
    version: "v1.0.10"
    evidence_type: source-reviewed
    result: "StrEnum values silently coerced to plain str after checkpoint round-trip — type information lost (bug #6598)"
  - tool: "Microsoft GraphRAG"
    version: "v3.0.5"
    evidence_type: source-reviewed
    result: "v3 pipeline is extremely slow compared to v2 after NetworkX removal; regression unresolved (issue #2250, open)"
  - tool: "LangGraph (langchain-ai)"
    version: "v1.0.10"
    evidence_type: source-reviewed
    result: "get_state().next returns empty tuple after resuming from first of two interrupt() calls — graph paused but snapshot reports complete (bug #6956, open)"
theory_delta: LangGraph checkpoint serialization fails silently for non-primitive types — round-trips are lossy with no exception raised, across four documented modes since Jan 2026.
a2a_card:
  type: finding
  topic: agentic-rag-pipeline
  claim: "LangGraph checkpoint round-trips are lossy for non-primitive types — four distinct silent failure modes (JsonPlusSerializer null-on-failure, StrEnum→str coercion, nested Enum→None, BinaryOperatorAggregate wrapper leak) corrupt state without raising exceptions, making stateful RAG pipelines unreliable."
  confidence: empirical
  action: test
  contribute: /api/signals
rubric:
  total_claims: 6
  tested_count: 0
  independently_confirmed: true
  unlinked_count: 0
  scope_matches: true
  falsification_stated: true
  content_type: finding
tasks:
  - task: agent-framework
    phase: run-production
---

# LangGraph checkpoints silently corrupt non-primitive types — your resume will not restore what you saved

*From [Theory Delta](https://theorydelta.com) | [Methodology](https://theorydelta.com/methodology/) | Published 2026-03-29*

## What you expect

LangGraph checkpointing is the foundation of human-in-the-loop and stateful RAG workflows. Save graph state — including Pydantic models, Enums, or custom classes — to a checkpoint, then resume from it. The resume should give you back what you saved.

## What actually happens

LangGraph checkpoint round-trips are lossy for non-primitive types. Four distinct silent failure modes have been confirmed since January 2026 in open bugs, all affecting LangGraph v1.0.10:

**1. JsonPlusSerializer null-on-failure ([bug #6970](https://github.com/langchain-ai/langgraph/issues/6970), open as of 2026-02-28):** When deserialization fails, `JsonPlusSerializer` replaces the failed value with `None` instead of raising an exception. The graph continues with a corrupted state object. No warning, no log entry, no exception. This affects any complex type stored in checkpoint state.

**2. StrEnum coerced to plain str ([bug #6598](https://github.com/langchain-ai/langgraph/issues/6598), January 2026):** `StrEnum` values silently become plain `str` after a checkpoint round-trip. Type information is lost. Code checking `isinstance(value, MyStrEnum)` will fail silently after a resume. Any state machine logic that routes on enum type (rather than enum value) breaks without error.

**3. Nested Enum fields become None ([bug #6718](https://github.com/langchain-ai/langgraph/issues/6718), February 2026):** Nested `Enum` fields in checkpoint state deserialize as `None` rather than raising. Like bug #6970, this is silent replacement — the state object looks valid but contains corrupted values.

**4. BinaryOperatorAggregate wrapper leak ([bug #6909](https://github.com/langchain-ai/langgraph/issues/6909), 2026-02-27):** When a channel starts `MISSING`, `BinaryOperatorAggregate` with `Overwrite` returns the wrapper object rather than the unwrapped payload. Downstream code receives a `BinaryOperatorAggregate` instance where it expects the actual state value.

These are not edge cases in obscure usage paths. They affect any LangGraph pipeline storing Pydantic models, Enums, or custom classes through checkpointing — which is most production agentic RAG pipelines.

**Human-in-the-loop workflows with chained interrupts are also broken.** [Bug #6956](https://github.com/langchain-ai/langgraph/issues/6956) (open, 2026-02-27): `get_state().next` returns an empty tuple `()` after resuming from the first of two `interrupt()` calls in the same node. The graph is still paused — but the snapshot reports it as complete. Any code checking `state.next` to determine whether a graph is still running will silently misread a paused graph as finished. An agent waiting for human approval may receive a "complete" signal and proceed without it.

**Conditional edge routing has a separate footgun.** Inline docstrings inside Python dict literals used as conditional edge mappings silently corrupt the routing key — the docstring becomes part of the dictionary key, producing a `KeyError` at runtime. A newer variant ([bug #6770](https://github.com/langchain-ai/langgraph/issues/6770)): `KeyError('__end__')` when a conditional router returns `'__end__'` but `path_map` does not explicitly include an `__end__`/END key.

## What this means for you

Your LangGraph stateful pipeline will appear to work in development — and will silently corrupt in production under specific type patterns you may already be using.

The failure path: you store a Pydantic model or Enum in graph state, checkpoint it (either for human review or fault recovery), resume — and the resumed state has `None` where a value should be. Your downstream logic receives `None`, makes a bad decision or throws an unhelpful error, and the root cause traces back to a checkpoint round-trip that never raised an exception.

The interrupt snapshot bug has a more direct impact on human-in-the-loop flows: your approval workflow will see "complete" and proceed. The human never approved. The workflow has no record that approval was skipped.

If you are on LangGraph v1.0.10 and any of these types appear in your checkpoint state — Pydantic models, StrEnum, nested Enum, BinaryOperatorAggregate channels — treat your checkpoint round-trips as unreliable until bugs #6970, #6598, #6718, and #6909 are resolved.

## What to do

**For LangGraph stateful pipelines:** Treat checkpoint round-trips as lossy for non-primitive types until bugs #6970, #6598, #6718, and #6909 are closed. Add explicit checkpoint validation after every `resume` call:

```python
# After resuming a LangGraph graph
state = graph.get_state(config)
# Validate critical fields are not None and have expected types
assert state.values.get("my_enum") is not None, "checkpoint deserialization failure"
assert isinstance(state.values["my_enum"], MyExpectedType), f"type corrupted: {type(state.values['my_enum'])}"
```

For state that must survive checkpoint round-trips, prefer primitive types (str, int, dict with primitive values) over Pydantic models and Enums where possible. If Enums are required, serialize them to their `.value` before storing in graph state and reconstruct on read.

**For human-in-the-loop workflows with chained interrupts:** Do not rely solely on `state.next` to determine if a graph is paused. Track interrupt state explicitly in your application layer until bug #6956 is closed.

**For conditional edge routing:** Do not use inline docstrings inside Python dict literals in edge mappings. Always include an explicit `"__end__": "__end__"` entry in `path_map` for any conditional router that may return `__end__`.

**Falsification criterion:** LangGraph v1.0.10+ passing a round-trip checkpoint test where Pydantic models, StrEnum, nested Enum, and BinaryOperatorAggregate values are preserved with type fidelity after a checkpoint cycle would disprove this finding.

## Evidence

| Tool | Version | Result |
|---|---|---|
| [LangGraph](https://github.com/langchain-ai/langgraph) | v1.0.10 | source-reviewed: JsonPlusSerializer replaces deserialization failures with None ([#6970](https://github.com/langchain-ai/langgraph/issues/6970), open) |
| [LangGraph](https://github.com/langchain-ai/langgraph) | v1.0.10 | source-reviewed: StrEnum coerced to str after checkpoint round-trip ([#6598](https://github.com/langchain-ai/langgraph/issues/6598)) |
| [LangGraph](https://github.com/langchain-ai/langgraph) | v1.0.10 | source-reviewed: nested Enum fields become None after resume ([#6718](https://github.com/langchain-ai/langgraph/issues/6718)) |
| [LangGraph](https://github.com/langchain-ai/langgraph) | v1.0.10 | source-reviewed: BinaryOperatorAggregate returns wrapper instead of payload ([#6909](https://github.com/langchain-ai/langgraph/issues/6909)) |
| [LangGraph](https://github.com/langchain-ai/langgraph) | v1.0.10 | source-reviewed: get_state().next empty after first of two interrupt() calls ([#6956](https://github.com/langchain-ai/langgraph/issues/6956), open) |
| [Microsoft GraphRAG](https://github.com/microsoft/graphrag) | v3.0.5 | source-reviewed: v3 pipeline extremely slow vs v2 after NetworkX removal ([#2250](https://github.com/microsoft/graphrag/issues/2250), open) |

**Confidence:** empirical — all four LangGraph serialization bugs are confirmed in open GitHub issues by third-party reporters (not Theory Delta). Not tested by execution in Theory Delta's environment — these are source-reviewed from the respective GitHub issue trackers. The bug status (open vs closed) reflects the state as of 2026-03-01; some may have been addressed in subsequent LangGraph releases.

**Strongest case against:** These bugs may already be fixed in LangGraph versions later than v1.0.10. Open issues do not guarantee unfixed behavior — LangGraph releases frequently. The serialization failures affect specific type patterns; pipelines using only primitive types in checkpoint state are unaffected.

**Open questions:** Which LangGraph version (if any) closes all four serialization bugs? Is there a LangGraph release where checkpoint round-trips can be considered reliable for Pydantic models?

Seen different? [Contribute your evidence](https://theorydelta.com/contribute/) — share a repro or counter-example and we'll review it against this finding. Reader evidence is what keeps these findings accurate.