AI & LLMs · Guide · AI & Prompt Tools
How to Use Pydantic AI
Define agents with validated result_types, inject dependencies, add tools, and stream responses with OpenTelemetry tracing. Build robust agents instantly online.
Pydantic AI is a Python agent framework from the team behind Pydantic. It treats LLM output like any other untrusted input — validate it against a schema, retry on failure, and let the type checker catch your mistakes. If you already use Pydantic for FastAPI request bodies, Pydantic AI feels like the obvious extension to agents and tool calls.
Advertisement
What Pydantic AI actually is
A thin, typed wrapper around model APIs (OpenAI, Anthropic, Gemini, Groq, Ollama, Bedrock) that forces every response through a Pydantic model. You define an Agent with a result_type, bind tools as decorated Python functions, and the framework handles JSON-schema generation, validation, and retry loops. The result is an object you can .attribute access with full IDE autocomplete instead of response["choices"][0][...].
Compared to LangChain it is smaller, more opinionated, and actually typed. Compared to raw API calls it gives you structured output, automatic retries on schema mismatch, and a standard place to hang dependencies (database sessions, API clients) via its deps_type system.
Installing
pip install pydantic-ai # or with a specific model provider extra pip install "pydantic-ai[openai]" pip install "pydantic-ai[anthropic]"
Set the provider API key in your environment (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). Python 3.9 or newer.
First working example
from pydantic import BaseModel
from pydantic_ai import Agent
class Invoice(BaseModel):
vendor: str
total: float
currency: str
due_date: str
agent = Agent(
"openai:gpt-4o-mini",
result_type=Invoice,
system_prompt="Extract invoice fields from the user message.",
)
result = agent.run_sync(
"Acme Corp billed us 1,249.00 EUR, due 2026-05-15."
)
print(result.data)
# Invoice(vendor='Acme Corp', total=1249.0, currency='EUR', due_date='2026-05-15')No JSON parsing, no try/except around json.loads, no “the model returned prose again.” If the model emits invalid JSON or the wrong shape, Pydantic AI retries with the validation error as feedback up to retries=1 by default.
A real workflow — tools and dependencies
Agents become useful when they can call functions. Register tools with @agent.tool; Pydantic AI derives the JSON schema from the signature.
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
@dataclass
class Deps:
db: "Database"
support_agent = Agent(
"anthropic:claude-sonnet-4",
deps_type=Deps,
system_prompt="You are a support agent. Use tools to look up customers.",
)
@support_agent.tool
async def get_customer(ctx: RunContext[Deps], email: str) -> dict:
"""Fetch a customer row by email."""
return await ctx.deps.db.fetch_one(
"SELECT id, plan, mrr FROM customers WHERE email = $1", email
)
async def handle_ticket(db, question: str):
result = await support_agent.run(question, deps=Deps(db=db))
return result.dataThe RunContext gives tools typed access to the shared deps. No global state, no monkey-patching, no LangChain callback handlers — just a dataclass you pass in.
Gotchas
Streaming and structured output don’t mix cleanly. If you want token streaming, drop the result_type and stream plain strings, or use run_stream with its partial-validation API and accept that early chunks may not validate.
Retries hide costs. A validation failure doubles your token bill for that turn. Watch the usage field on results when you’re tuning prompts, especially with expensive models.
Tool docstrings are the prompt. The function docstring and parameter types become the JSON schema the model sees. Sloppy docstrings produce sloppy tool calls. Treat them like API docs.
When NOT to use it
Skip Pydantic AI if you need a huge pre-built tool ecosystem (LangChain’s integrations are still an order of magnitude bigger), if you’re staying in JavaScript/TypeScript, or if you’re doing pure RAG over documents — LlamaIndex handles that with less glue code. For small typed extract-and-tool-call services, though, Pydantic AI is the least-painful option in Python today.
Tighten your agent system prompt with the prompt improver, validate sample JSON payloads against your Pydantic schemas in the JSON formatter, and count prompt tokens before you ship with the token counter.
Use these while you read
Tools that pair with this guide
- Agent JSON ValidatorPaste tool-call or agent-output JSON — parse, pretty-print, highlight errors, and count keys and depth.AI & Prompt Tools
- AI Prompt GeneratorTurn a vague idea into a structured prompt. Pick role, task, context, constraints, and output format. Works with ChatGPT, Claude, and Gemini.AI & Prompt Tools
- AI Token CounterEstimate tokens, characters, words, and approximate API cost for GPT-4o, GPT-4, Claude, and Gemini — before you hit send.AI & Prompt Tools
- AI Prompt LibraryBrowse a curated catalog of prompt templates for writing, coding, marketing, and research. One click to copy.AI & Prompt Tools
Advertisement
Continue reading
- AI & LLMsGitHub Copilot Pricing and ComparisonCompare free vs paid GitHub Copilot tiers and analyze it against ChatGPT, Cursor, and Tabnine. Find the best value plan instantly with this free online guide.
- AI & LLMsGitHub Copilot Features and CapabilitiesTest what Copilot really does — code accuracy, scope limits, debugging, web dev, legacy code, tests, docs, team customization. Free guide, no sign-up.
- AI & LLMsGitHub Copilot Security and Data HandlingAudit where your code goes, who sees it, training-data policy, network needs, and what happens when Copilot suggests broken code. Free, no sign-up.
- AI & LLMsAI Fluency SkillsThe 8 sub-skills of AI fluency: prompt structure, model selection, tool use, quality calibration, iteration, context management, cost awareness, privacy.
- AI & LLMsAnthropic Skills ExplainedSkills as Anthropic's answer to Custom GPTs — markdown-defined, version-controlled in git, work in terminal. Anatomy + Skills vs Custom GPTs.
- AI & LLMsKimi K2 vs DeepSeek V3Two open-weight Chinese flagships. Kimi K2 = 1M context, DeepSeek V3.2 = top-tier reasoning + coding. Pick by use case.