AI & LLMs · Guide · AI & Prompt Tools

How to Use Letta (MemGPT)

Build agents that remember across sessions using Letta's archival storage and memory management. Set up the server or cloud instance in seconds online.

By FreeToolArena Staff · Updated June 2026 · 6 min read

Letta (formerly MemGPT) is an open-source framework for stateful agents — LLMs that manage their own long-term memory across conversations.

MemGPT started as a Berkeley research project that gave LLMs an operating-system-style memory hierarchy: a small in-context working set, a larger archival store, and tools to page between them. It rebranded as Letta in 2024 and now ships a server, a Python/TypeScript SDK, and the Agent Development Environment (ADE) — a visual debugger for stateful agents.

What it is

Letta runs a persistent server that owns agent state: core memory blocks (persona, human), archival memory (vector store), and message history. Agents are addressable by ID and survive restarts. You talk to them over REST or WebSocket; they call tools, update their own memory blocks, and keep learning across sessions.

Install / sign up

# Docker (recommended)
docker run -it -p 8283:8283 \
  -v ~/.letta:/root/.letta \
  letta/letta:latest

# Or pip
pip install letta
letta server

# Cloud option: https://app.letta.com (managed)

First session

Open the ADE at http://localhost:8283, create an agent, and start chatting. Watch the memory panel on the right — when you mention your name, you’ll see the agent update its “human” block in real time.

$ letta run
> Hi, I'm Jay and I build SEO tools.
# agent writes to core memory:
#   human: "Name is Jay. Builds SEO tools."
> What do I work on?
# agent recalls from core memory, not <a href="/learn/context-window">context window</a>

Everyday workflows

1. Build a personal assistant that remembers preferences across weeks of chats.
2. Give agents tools (Python functions or MCP servers) so they can act, not just remember.
3. Use the ADE to inspect memory edits and step through tool calls when debugging.

Gotchas and tips

Archival memory uses a vector store (pgvector by default) — point it at a durable Postgres in production, not the in-container SQLite, or you’ll lose memories on restart. Letta supports any OpenAI-compatible endpoint, so local models via Ollama or vLLM work fine for privacy-sensitive deployments.

Core memory blocks are small (a few KB) on purpose — they’re always in context. Push larger facts into archival and let the agent retrieve them. The agent’s self-editing of memory is powerful but occasionally overwrites useful info; version your memory blocks via the API if that matters.

Who it’s for

Builders of long-lived assistants, companion apps, customer-support bots, and any product where “the agent remembers you” is the core value prop.

Use these while you read

Tools that pair with this guide

Found this useful?Email Buy Me a Coffee

Continue reading

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →