AI & LLMs · Guide · AI & Prompt Tools
How to Use Letta (MemGPT)
Installing Letta, server vs cloud, ADE visual tool, building agents with long-term memory, archival storage.
Letta (formerly MemGPT) is an open-source framework for stateful agents — LLMs that manage their own long-term memory across conversations.
Advertisement
MemGPT started as a Berkeley research project that gave LLMs an operating-system-style memory hierarchy: a small in-context working set, a larger archival store, and tools to page between them. It rebranded as Letta in 2024 and now ships a server, a Python/TypeScript SDK, and the Agent Development Environment (ADE) — a visual debugger for stateful agents.
What it is
Letta runs a persistent server that owns agent state: core memory blocks (persona, human), archival memory (vector store), and message history. Agents are addressable by ID and survive restarts. You talk to them over REST or WebSocket; they call tools, update their own memory blocks, and keep learning across sessions.
Install / sign up
# Docker (recommended) docker run -it -p 8283:8283 \ -v ~/.letta:/root/.letta \ letta/letta:latest # Or pip pip install letta letta server # Cloud option: https://app.letta.com (managed)
First session
Open the ADE at http://localhost:8283, create an agent, and start chatting. Watch the memory panel on the right — when you mention your name, you’ll see the agent update its “human” block in real time.
$ letta run > Hi, I'm Jay and I build SEO tools. # agent writes to core memory: # human: "Name is Jay. Builds SEO tools." > What do I work on? # agent recalls from core memory, not context window
Everyday workflows
- 1. Build a personal assistant that remembers preferences across weeks of chats.
- 2. Give agents tools (Python functions or MCP servers) so they can act, not just remember.
- 3. Use the ADE to inspect memory edits and step through tool calls when debugging.
Gotchas and tips
Archival memory uses a vector store (pgvector by default) — point it at a durable Postgres in production, not the in-container SQLite, or you’ll lose memories on restart. Letta supports any OpenAI-compatible endpoint, so local models via Ollama or vLLM work fine for privacy-sensitive deployments.
Core memory blocks are small (a few KB) on purpose — they’re always in context. Push larger facts into archival and let the agent retrieve them. The agent’s self-editing of memory is powerful but occasionally overwrites useful info; version your memory blocks via the API if that matters.
Who it’s for
Builders of long-lived assistants, companion apps, customer-support bots, and any product where “the agent remembers you” is the core value prop.
Advertisement