AI & LLMs · Guide · AI & Prompt Tools
How to Use SWE-agent
Installing SWE-agent, the agent-computer interface (ACI), running on SWE-bench, configuring models, cost control.
SWE-agent is Princeton’s autonomous software-engineering agent that takes a GitHub issue and a repo, then writes, runs, and tests a patch end-to-end without human hand-holding.
Advertisement
SWE-agent is an open-source framework from the Princeton NLP group, built to solve real software-engineering tasks by driving a language model through a specially designed Agent-Computer Interface (ACI). It was the first agent to crack double-digit scores on SWE-bench, a benchmark of unresolved GitHub issues from popular Python repos. Researchers use it to study agent capabilities, teams use it to triage bug backlogs, and CTF players use the EnIGMA spin-off for capture-the-flag challenges. It’s MIT-licensed and maintained by the SWE-agent authors.
What it is
The core insight is the ACI: instead of giving a model raw shell access, SWE-agent exposes narrow, high-feedback commands (open, goto, edit, find_file, search_dir, submit) that a model can actually use well. It wraps these in a sandboxed Docker environment, runs the agent loop against providers like Claude, GPT, or any LiteLLM-supported model, and emits a patch plus a full trajectory log. Configuration lives in YAML files so you can swap prompts, tools, and models without touching code.
Install
git clone https://github.com/SWE-agent/SWE-agent.git cd SWE-agent pip install --editable . # Docker must be installed and running for sandboxed execution
First run
Point the agent at a live GitHub issue and watch it clone the repo, reproduce the bug, edit files, and emit a patch. Set your API key first.
$ export ANTHROPIC_API_KEY=sk-ant-... $ sweagent run \ --agent.model.name=claude-sonnet-4 \ --problem_statement.github_url=https://github.com/pvlib/pvlib-python/issues/1603 [INFO] Cloned repo to /tmp/... [INFO] Step 1: open pvlib/iotools/psm3.py [INFO] Step 7: submit [DONE] Patch written to trajectories/<run-id>/patch.diff
Everyday workflows
- Batch SWE-bench — run sweagent run-batch against the dataset to reproduce benchmark numbers locally.
- Fix local issues — pass --problem_statement.path to a text file describing a bug in your own codebase.
- Swap models — edit the YAML to try Claude, GPT-4o, DeepSeek, or a local model through LiteLLM without changing agent logic.
Gotchas and tips
Cost is real: a single SWE-bench instance can burn 50k–200k tokens on frontier models, and full-dataset runs get expensive fast. Start with ten instances to calibrate, and cache the Docker environments — rebuilding them for every task dominates wall-clock time on a cold machine. Trajectories are verbose JSON; browse them with the included inspector_web tool rather than tailing raw files.
The agent is tuned for Python repos and pytest-style test suites. Non-Python languages and custom build systems work but often need a custom YAML with the right install and test commands. Pin the SWE-agent version if you’re publishing results — behavior shifts meaningfully between releases as prompts are refined.
Who it’s for
SWE-agent fits researchers benchmarking agent capabilities and engineering teams curious about autonomous bug-fixing on Python codebases. Read the ACI paper before your first serious run — understanding why the commands are shaped the way they are will save you from fighting the framework.
Advertisement