The Memory Architecture of Claude Code

Introduction

Since the rise of frontier AI labs like OpenAI, Anthropic, and Google DeepMind, progress in AI systems has largely taken place behind closed doors, with little visibility into their underlying implementations. That changed last week, when the source code for Claude Code leaked—offering a rare glimpse into how Anthropic's flagship product handles memory.

Large language models are inherently stateless. Every session begins from scratch: you re-explain your project, restate your preferences, and reestablish context—only to repeat the process the next day. This limitation is well known, and a wide range of solutions has emerged, from retrieval-augmented generation (RAG) with vector databases to more advanced approaches like Graph RAG and file-based search systems.

Claude Code takes a different approach. It relies on a file-based memory system combined with extensive prompt engineering. This post presents a deep dive into that system, reverse-engineered from the leaked source code. At its core, the design is surprisingly simple: memories are stored as short (~150-character) text entries written to disk. The responsibility for reading and writing these memories is delegated to the language model itself.

The system boils down to five core paths: three for reading memories and two for writing them. All paths operate on a shared storage of markdown files on disk.

The Architecture at a Glance

Click any node to jump to its section.

flowchart TB MEM["Memory Directory\n.md files with frontmatter + MEMORY.md index"] MEM --> R1 MEM --> R2 MEM --> R3 subgraph Read ["Read Paths"] direction LR R1["1. Chat Start\nMEMORY.md → user context"] R2["2. Per-Message Prefetch\nSonnet selects up to 5 files"] R3["3. Tool Call Read Path\nModel calls Read tool"] end R1 --> Agent R2 --> Agent R3 --> Agent Agent["Main Agent"] Agent --> W1 Agent --> W2 subgraph Write ["Write Paths"] direction LR W1["1. Background Extraction\nSubagent after final response"] W2["2. Model Writes Directly\nWrite/Edit during reasoning"] end click MEM "#memory-directory" click R1 "#read-path-1" click R2 "#read-path-2" click R3 "#read-path-3" click W1 "#write-path-1" click W2 "#write-path-2"

The Memory Directory

All memories live in a single directory on disk: ~/.claude/projects/<project-slug>/memory/ in simple markdown files.

Each memory file has YAML frontmatter with three fields: a name, a one-line description, and a type. The description is important — it is what the system reads when deciding which memories are relevant to a given message, without reading the full file. This decision is made by an LLM call to Sonnet, more on that later.

There are four memory types:

user

Who you are, your role, your expertise, your preferences.

"I've been writing Go for ten years but this is my first time touching the React side of this repo"

→ saves: deep Go expertise, new to React — frame frontend explanations in terms of backend analogues

feedback

Guidance you've given about how to work. Both corrections and confirmations.

"Don't mock the database in these tests — we got burned last quarter when mocked tests passed but the prod migration failed"

→ saves: integration tests must hit a real database, not mocks

project

Context about ongoing work not in code or git history. Deadlines, motivations.

"We're freezing all non-critical merges after Thursday — mobile team is cutting a release branch"

→ saves: merge freeze begins 2026-03-05 for mobile release cut

reference

Pointers to external systems and where to find information.

"Check the Linear project INGEST if you want context on these tickets"

→ saves: pipeline bugs are tracked in Linear project "INGEST"

Alongside the individual memory files sits MEMORY.md — an index file. Each line is a pointer: a title, a link to the file, and a one-line hook. Think of it as a table of contents. It is always loaded at the start of every conversation, truncated to 200 lines or 25KB, whichever comes first.

Notably absent from the memory types: code patterns, architecture, file paths, git history, debugging recipes. The system explicitly excludes anything derivable from the current state of the codebase. Memory is reserved for what the code can't tell you.

Read Path 1: Chat Start

When a new conversation begins, two things happen.

First, what Anthropic calls behavioral instructions. These instructions are loaded into the system prompt.

Full main agent read instructions

When to access memories:

When memories seem relevant, or the user references prior-conversation work.
You MUST access memory when the user explicitly asks you to check, recall, or remember.
If the user says to ignore or not use memory: proceed as if MEMORY.md were empty.

Memory records can become stale over time. Before answering based solely on memory, verify that the memory is still correct by reading the current state of the files or resources. If a recalled memory conflicts with current information, trust what you observe now — and update or remove the stale memory.

A memory that names a specific function, file, or flag is a claim that it existed when the memory was written. It may have been renamed, removed, or never merged. Before recommending it: if the memory names a file path, check the file exists. If the memory names a function or flag, grep for it.

"The memory says X exists" is not the same as "X exists now."

These are hardcoded — the types of memory, the save and recall rules, the type definitions. These instructions are cached and don't change between turns.

Second, MEMORY.md is loaded into the user context. Not the system prompt — the user context. This gives the model the full index of what memories exist, so it knows what it can look up. If the index exceeds 200 lines or 25KB, it is hard-chopped — no summarization, no prioritization, just the first 200 lines with a warning appended.

This is the always-on baseline. Every conversation starts with the model knowing the rules and having the table of contents.

Read Path 2: Per-Message Prefetch

Every time you send a message, a simple pre-fetch call is made to Sonnet with the prompt: "Which of these are relevant? Pick up to 5.". This runs concurrently with the main API call. More details on this:

Your message text is extracted. Single-word messages are skipped.
scanMemoryFiles() reads the frontmatter only of every .md file in the memory directory. Not the file contents — just the name, description, and type.
Those headers, along with your message, are sent to Sonnet via a side query. The prompt is simple: "Which of these are relevant? Pick up to 5."
The full contents of the selected files are read from disk.
Anything already in context — from prior turns or earlier tool calls — is filtered out.
The survivors are injected into your message as <system-reminder> tags.

A few constraints keep this from flooding the context. There is a per-session byte budget — once cumulative injected bytes exceed a cap, surfacing stops. Files surfaced in prior turns are not surfaced again. Recently-used tools are also passed to the selector so it doesn't surface docs for tools already working fine. And the cap is 5 files per turn, chosen by a smaller, faster model (Sonnet) that only sees frontmatter.

The <system-reminder> mechanism: these are XML tags wrapping text that get inserted into user messages, not the system prompt. The model is trained to treat content inside these tags as system-level context even though it arrives in a user turn. This lets the system inject context mid-conversation without modifying the system prompt on every turn.

Read Path 3: Tool Call Read Path

The simplest path. There is no special "memory read" tool. It is the regular Read tool — the same one used to read any file in the codebase. The model decides on its own to read a memory file, the same way it decides to read any source file.

The only memory-specific behavior: IF the file that is read is in the memory directory, the prompt passed to the model contains the file contents + text indicating when the file was created. For eg, "this memory was saved 3 days ago" would be prepended to the file content. No special routing, no special tool.

When does this path fire? When the model decides it should. The behavioral instructions in the system prompt tell it when to consider reading memories ("when memories seem relevant", "when the user explicitly asks you to recall"), but enforcement is purely prompt-based. The runtime doesn't orchestrate it — it is the model choosing its own sequence of tool calls.

3 read paths, 2 write paths, 1 directory of markdown files.

Write Path 1: Background Extraction

After the model produces its final text response — when it is done reasoning and has no more tool calls — the extraction mechanism fires.

A background copy of the conversation is forked. A subagent receives the messages since the last extraction (the prompt says "the most recent ~10 messages," but the runtime scopes it to messages since the last extraction ran) and a manifest of all existing memory files (frontmatter only, so it knows what already exists). It is sandboxed: it can read code files, but it can only write to the memory directory.

The subagent is instructed to be efficient — read all files it might update on turn 1, write all changes on turn 2. No interleaving, no investigation. It applies the same type taxonomy and the same exclusion rules as the main agent.

Full extraction subagent prompt

You are now acting as the memory extraction subagent. Analyze the most recent ~10 messages above and use them to update your persistent memory systems.

Available tools: Read, Grep, Glob, read-only Bash (ls/find/cat/stat/wc/head/tail and similar), and Edit/Write for paths inside the memory directory only. Bash rm is not permitted. All other tools — MCP, Agent, write-capable Bash, etc — will be denied.

You have a limited turn budget. Edit requires a prior Read of the same file, so the efficient strategy is: turn 1 — issue all Read calls in parallel for every file you might update; turn 2 — issue all Write/Edit calls in parallel. Do not interleave reads and writes across multiple turns.

You MUST only use content from the last ~10 messages to update your persistent memories. Do not waste any turns attempting to investigate or verify that content further — no grepping source files, no reading code to confirm a pattern exists, no git commands.

Several conditions gate whether extraction runs at all:

If the main agent already wrote memory files during this turn, extraction is skipped — no need to double-write.
Sub-agents don't trigger extraction, only the main agent does.
Extraction is skipped in remote mode.
A turn counter throttles frequency (default: every turn, but configurable).
If one extraction is already running when another turn ends, it queues a single trailing run rather than stacking up.

Write Path 2: Model Writes Directly

The model can also write memories during its normal reasoning, by calling the Write or Edit tool on a file in the memory directory. Same as any other file write — no special mechanism.

The behavioral instructions in the system prompt tell the model the two-step save process: first, write the memory file with proper frontmatter; second, add a pointer to MEMORY.md. The model decides on its own when to do this — typically when the user explicitly asks it to remember something, or when it encounters information that matches the save criteria for one of the four memory types.

Full main agent write instructions

You have a persistent, file-based memory system at ~/.claude/projects/<slug>/memory/. This directory already exists — write to it directly with the Write tool (do not run mkdir or check for its existence).

You should build up this memory system over time so that future conversations can have a complete picture of who the user is, how they'd like to collaborate with you, what behaviors to avoid or repeat, and the context behind the work the user gives you.

If the user explicitly asks you to remember something, save it immediately as whichever type fits best. If they ask you to forget something, find and remove the relevant entry.

How to save memories

Saving a memory is a two-step process:

Step 1 — write the memory to its own file (e.g., user_role.md, feedback_testing.md) using YAML frontmatter with name, description, and type fields.

Step 2 — add a pointer to that file in MEMORY.md. MEMORY.md is an index, not a memory — each entry should be one line, under ~150 characters.

Summary

The overall design philosophy is clear: memories are short text summaries that are stored in a MEMORY.md, and the model is given read/write to access and create these memories. These markdown files on disk come with YAML frontmatter for metadata, and you can open the memory directory and read exactly what the model knows about you. Retrieval follows a 2-step approach to save on cost. The cheap path (frontmatter scan + Sonnet classification) runs on every message. The expensive path (full file reads) only happens for the 5 files that pass the filter.

What's Missing

As a model of memory, Claude Code's approach is closer to a filing cabinet than the human brain. Comparing it to how human memory actually works, the following fundamental gaps emerge.

Reactive, not Pro-active

Claude's memory only activates when you send a message — nothing surfaces on its own. Human memory is spontaneous. Memories come to mind without us searching, triggered by our ambience: who we're with, where we are, what we're doing. The Sonnet prefetch sees your message text and nothing else — there is no ambient context, no sense of location, time of day, company, activity, or mood. Claude has no ambience.

200 lines, Not lifelong

The MEMORY.md index truncates at 200 lines. Individual memory files are short texts. This works for project-level preferences, but not for a lifetime of personal memories — petabytes of interactions, photos, conversations, experiences. The architecture has a hard ceiling. And there is no learning from use — a memory saved six months ago has the same standing as one saved yesterday. There is no strengthening through recall, no decay, no adaptation based on whether surfaced memories were actually useful.

Retrieval is keyword based (semantic), not associative

Sonnet matches frontmatter descriptions against your message — essentially semantic text search. For a memory to make it to Claude, your prompt must contain the exact words mentioned in the memory. In sharp contrast, human memory is associative: one memory triggers another, forming chains of recall across multiple hops. Claude's retrieval is flat. There is no chaining, no "this reminds me of that."

Claude Code's memory system is a pragmatic engineering solution to a real limitation — Large Language Models are stateless, and were not designed to model human memory. As such, it addresses the immediate need. However, the architecture does not attempt to capture the three properties that cognitive science identifies as fundamental to human memory — proactive recall, lifelong retention, and associative retrieval. Bridging this gap between information storage and true memory remains an open and important problem.