Skip to content
Free Tool Arena

AI & LLMs · Guide · AI & Prompt Tools

How to Use SWE-agent

Installing SWE-agent, the agent-computer interface (ACI), running on SWE-bench, configuring models, cost control.

Updated April 2026 · 6 min read

SWE-agent is Princeton’s autonomous software-engineering agent that takes a GitHub issue and a repo, then writes, runs, and tests a patch end-to-end without human hand-holding.

Advertisement

SWE-agent is an open-source framework from the Princeton NLP group, built to solve real software-engineering tasks by driving a language model through a specially designed Agent-Computer Interface (ACI). It was the first agent to crack double-digit scores on SWE-bench, a benchmark of unresolved GitHub issues from popular Python repos. Researchers use it to study agent capabilities, teams use it to triage bug backlogs, and CTF players use the EnIGMA spin-off for capture-the-flag challenges. It’s MIT-licensed and maintained by the SWE-agent authors.

What it is

The core insight is the ACI: instead of giving a model raw shell access, SWE-agent exposes narrow, high-feedback commands (open, goto, edit, find_file, search_dir, submit) that a model can actually use well. It wraps these in a sandboxed Docker environment, runs the agent loop against providers like Claude, GPT, or any LiteLLM-supported model, and emits a patch plus a full trajectory log. Configuration lives in YAML files so you can swap prompts, tools, and models without touching code.

Install

git clone https://github.com/SWE-agent/SWE-agent.git
cd SWE-agent
pip install --editable .
# Docker must be installed and running for sandboxed execution

First run

Point the agent at a live GitHub issue and watch it clone the repo, reproduce the bug, edit files, and emit a patch. Set your API key first.

$ export ANTHROPIC_API_KEY=sk-ant-...
$ sweagent run \
  --agent.model.name=claude-sonnet-4 \
  --problem_statement.github_url=https://github.com/pvlib/pvlib-python/issues/1603
[INFO] Cloned repo to /tmp/...
[INFO] Step 1: open pvlib/iotools/psm3.py
[INFO] Step 7: submit
[DONE] Patch written to trajectories/<run-id>/patch.diff

Everyday workflows

  • Batch SWE-bench — run sweagent run-batch against the dataset to reproduce benchmark numbers locally.
  • Fix local issues — pass --problem_statement.path to a text file describing a bug in your own codebase.
  • Swap models — edit the YAML to try Claude, GPT-4o, DeepSeek, or a local model through LiteLLM without changing agent logic.

Gotchas and tips

Cost is real: a single SWE-bench instance can burn 50k–200k tokens on frontier models, and full-dataset runs get expensive fast. Start with ten instances to calibrate, and cache the Docker environments — rebuilding them for every task dominates wall-clock time on a cold machine. Trajectories are verbose JSON; browse them with the included inspector_web tool rather than tailing raw files.

The agent is tuned for Python repos and pytest-style test suites. Non-Python languages and custom build systems work but often need a custom YAML with the right install and test commands. Pin the SWE-agent version if you’re publishing results — behavior shifts meaningfully between releases as prompts are refined.

Who it’s for

SWE-agent fits researchers benchmarking agent capabilities and engineering teams curious about autonomous bug-fixing on Python codebases. Read the ACI paper before your first serious run — understanding why the commands are shaped the way they are will save you from fighting the framework.

Advertisement

Found this useful?Email