AI & LLMs · Guide · AI & Prompt Tools
How to Use SWE-agent
Solve GitHub issues automatically with SWE-agent. This free guide covers the agent-computer interface, model config, and cost control, with instant browser-based setup instructions.
SWE-agent is Princeton’s autonomous software-engineering agent that takes a GitHub issue and a repo, then writes, runs, and tests a patch end-to-end without human hand-holding.
Advertisement
SWE-agent is an open-source framework from the Princeton NLP group, built to solve real software-engineering tasks by driving a language model through a specially designed Agent-Computer Interface (ACI). It was the first agent to crack double-digit scores on SWE-bench, a benchmark of unresolved GitHub issues from popular Python repos. Researchers use it to study agent capabilities, teams use it to triage bug backlogs, and CTF players use the EnIGMA spin-off for capture-the-flag challenges. It’s MIT-licensed and maintained by the SWE-agent authors.
What it is
The core insight is the ACI: instead of giving a model raw shell access, SWE-agent exposes narrow, high-feedback commands (open, goto, edit, find_file, search_dir, submit) that a model can actually use well. It wraps these in a sandboxed Docker environment, runs the agent loop against providers like Claude, GPT, or any LiteLLM-supported model, and emits a patch plus a full trajectory log. Configuration lives in YAML files so you can swap prompts, tools, and models without touching code.
Install
git clone https://github.com/SWE-agent/SWE-agent.git cd SWE-agent pip install --editable . # Docker must be installed and running for sandboxed execution
First run
Point the agent at a live GitHub issue and watch it clone the repo, reproduce the bug, edit files, and emit a patch. Set your API key first.
$ export ANTHROPIC_API_KEY=sk-ant-... $ sweagent run \ --agent.model.name=claude-sonnet-4 \ --problem_statement.github_url=https://github.com/pvlib/pvlib-python/issues/1603 [INFO] Cloned repo to /tmp/... [INFO] Step 1: open pvlib/iotools/psm3.py [INFO] Step 7: submit [DONE] Patch written to trajectories/<run-id>/patch.diff
Everyday workflows
- Batch SWE-bench — run sweagent run-batch against the dataset to reproduce benchmark numbers locally.
- Fix local issues — pass --problem_statement.path to a text file describing a bug in your own codebase.
- Swap models — edit the YAML to try Claude, GPT-4o, DeepSeek, or a local model through LiteLLM without changing agent logic.
Gotchas and tips
Cost is real: a single SWE-bench instance can burn 50k–200k tokens on frontier models, and full-dataset runs get expensive fast. Start with ten instances to calibrate, and cache the Docker environments — rebuilding them for every task dominates wall-clock time on a cold machine. Trajectories are verbose JSON; browse them with the included inspector_web tool rather than tailing raw files.
The agent is tuned for Python repos and pytest-style test suites. Non-Python languages and custom build systems work but often need a custom YAML with the right install and test commands. Pin the SWE-agent version if you’re publishing results — behavior shifts meaningfully between releases as prompts are refined.
Who it’s for
SWE-agent fits researchers benchmarking agent capabilities and engineering teams curious about autonomous bug-fixing on Python codebases. Read the ACI paper before your first serious run — understanding why the commands are shaped the way they are will save you from fighting the framework.
Use these while you read
Tools that pair with this guide
- AI Cost EstimatorEstimate daily, monthly, and yearly API cost for GPT-4o, Claude, Gemini, and more based on your traffic and token usage.AI & Prompt Tools
- AI Prompt GeneratorTurn a vague idea into a structured prompt. Pick role, task, context, constraints, and output format. Works with ChatGPT, Claude, and Gemini.AI & Prompt Tools
- AI Token CounterEstimate tokens, characters, words, and approximate API cost for GPT-4o, GPT-4, Claude, and Gemini — before you hit send.AI & Prompt Tools
- AI Prompt LibraryBrowse a curated catalog of prompt templates for writing, coding, marketing, and research. One click to copy.AI & Prompt Tools
Advertisement
Continue reading
- AI & LLMsGitHub Copilot Pricing and ComparisonCompare free vs paid GitHub Copilot tiers and analyze it against ChatGPT, Cursor, and Tabnine. Find the best value plan instantly with this free online guide.
- AI & LLMsGitHub Copilot Features and CapabilitiesTest what Copilot really does — code accuracy, scope limits, debugging, web dev, legacy code, tests, docs, team customization. Free guide, no sign-up.
- AI & LLMsGitHub Copilot Security and Data HandlingAudit where your code goes, who sees it, training-data policy, network needs, and what happens when Copilot suggests broken code. Free, no sign-up.
- AI & LLMsAI Fluency SkillsThe 8 sub-skills of AI fluency: prompt structure, model selection, tool use, quality calibration, iteration, context management, cost awareness, privacy.
- AI & LLMsAnthropic Skills ExplainedSkills as Anthropic's answer to Custom GPTs — markdown-defined, version-controlled in git, work in terminal. Anatomy + Skills vs Custom GPTs.
- AI & LLMsKimi K2 vs DeepSeek V3Two open-weight Chinese flagships. Kimi K2 = 1M context, DeepSeek V3.2 = top-tier reasoning + coding. Pick by use case.