Skip to content
Free Tool Arena

AI & LLMs · Guide · AI & Prompt Tools

How to Build an Agent with the OpenAI Agents SDK

Build a working agent in Python using OpenAI's Agents SDK — tools, handoffs, guardrails, and the model-native sandbox harness.

Updated April 2026 · 6 min read

The OpenAI Agents SDK is the production successor to the old Swarm experiments — a small, sharp Python and TypeScript library for building agents that use tools, hand off to other agents, and run inside a sandbox. It’s what you reach for when ChatGPT agent mode stops being enough and you want your code orchestrating the model.

This guide walks from pip install to a working agent that does something real, with the four primitives you actually need in April 2026: Agents, Tools, Handoffs, Guardrails.

Advertisement

Prerequisites

  • Python 3.10 or newer.
  • An OpenAI API key (OPENAI_API_KEY) with billing set up.
  • A spend cap on the key. Set it before you write a line of code.

Step 1 — Install the SDK

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install openai-agents

The package name is openai-agents. Don’t confuse it with the older openai core library — you’ll use both.

Step 2 — The smallest working agent

Paste this into agent.py:

from agents import Agent, Runner

agent = Agent(
    name="Tutor",
    instructions="You are a patient tutor. Answer in <= 3 sentences.",
)

if __name__ == "__main__":
    result = Runner.run_sync(agent, "Explain eigenvectors like I'm 12.")
    print(result.final_output)

Run with python agent.py. If it prints an explanation, you’re done with step 2 — the SDK, the key, and the model are all wired up.

Step 3 — Add a tool

Tools are functions the agent can call. Decorate them with @function_tool and they show up in the model’s function list automatically.

from agents import Agent, Runner, function_tool

@function_tool
def word_count(text: str) -> int:
    """Count words in a string."""
    return len(text.split())

agent = Agent(
    name="Editor",
    instructions="When asked about length, call word_count.",
    tools=[word_count],
)

Run it on “How long is ‘the quick brown fox’?”. The model decides to call word_count, the SDK runs the Python, the result flows back into the conversation.

Step 4 — Add a handoff

Handoffs let an agent delegate to a specialist. Instead of one giant prompt, you compose small agents.

from agents import Agent

math_agent = Agent(name="Math", instructions="Solve step-by-step.")
writing_agent = Agent(name="Writing", instructions="Edit for clarity.")

triage = Agent(
    name="Triage",
    instructions="Hand off to math for math, writing for prose.",
    handoffs=[math_agent, writing_agent],
)

The triage agent reads the user’s message, decides which specialist to invoke, and the SDK transfers the conversation. You debug each specialist in isolation — this is the biggest reason to use the SDK over a single mega-prompt.

Step 5 — Add a guardrail

Guardrails are validators the SDK runs on inputs and outputs.

from agents import input_guardrail, GuardrailFunctionOutput

@input_guardrail
def no_secrets(ctx, agent, input_str):
    banned = ("api_key", "password", "ssn")
    tripped = any(b in input_str.lower() for b in banned)
    return GuardrailFunctionOutput(
        output_info={"flagged": tripped},
        tripwire_triggered=tripped,
    )

Attach it to your agent with Agent(…, input_guardrails=[no_secrets]). If the guardrail trips, the SDK raises before the model ever sees the prompt — cheap, fast, and logged.

Step 6 — Sandbox code and file work

The April 2026 Agents SDK ships with a model-native harness— the agent can inspect files, edit them, run shell commands, and iterate on long-horizon tasks inside a sandbox your process controls. This is the feature to use when you’re tempted to give an agent raw shell access: don’t, use the harness instead.

Step 7 — Deploy

The same Python file runs on your laptop, on Fly.io, on a VPS, in Lambda. Wrap it in a FastAPI handler for a webhook, a scheduled job for a cron, or a CLI for humans. The SDK stays the same — that’s the point of it being lightweight.

Pitfalls I’ve seen

  • Giant single agents. Break them up with handoffs early. Debugging a 600-line system prompt is misery.
  • Tools that do five things. Keep each tool to one responsibility — the model picks them better.
  • No token cap. Set max_turns on the runner. An infinite-loop agent at 3am is an expensive learning experience.

Once you’re comfortable, compare the flow to the Claude Agent SDK — same primitives, different model strengths, and MCP as the tool standard. Also run your prompts through our token counter so you know what each turn costs before you put the thing on a cron.