Skip to content
Free Tool Arena

AI & LLMs · Guide · AI & Prompt Tools

How to Use Promptfoo

Installing promptfoo, writing promptfooconfig.yaml, assertions, red-teaming, CI integration, web UI.

Updated April 2026 · 6 min read

Promptfoo is a CLI that treats prompts like code — YAML tests, assertions, diffs, and CI-friendly output.

Advertisement

Promptfoo is what unit tests look like for LLMs. You declare prompts, test cases, and assertions in a YAML file, run promptfoo eval, and get a side-by-side grid of outputs with pass/fail scoring. It plugs into CI, supports red-teaming, and speaks nearly every model provider natively.

What it is

A Node.js CLI and web viewer. It loads a config, fans out requests across providers and prompt variants, runs deterministic (contains, regex, equals) and model-graded (llm-rubric, similar) assertions, and writes results to a local SQLite database. The viewer renders diffs and lets you share results.

Install / set up

# global install
npm install -g promptfoo
promptfoo init
export OPENAI_API_KEY=sk-...

First run

promptfoo init creates a promptfooconfig.yaml with a sample prompt and two test cases. Run promptfoo eval and it executes every combination of prompts, providers, and tests, then opens a browser view of the results grid.

$ promptfoo eval
[==================] 8/8 complete
$ promptfoo view
Open http://localhost:15500

Everyday workflows

  • Compare GPT-4o, Claude, and a local Llama on the same test set to pick the cheapest model that still passes.
  • Gate pull requests with promptfoo eval --assert in CI so prompt regressions never ship.
  • Run promptfoo redteam to generate adversarial inputs (jailbreaks, PII leaks, prompt injection) against your app.

Gotchas and tips

Model-graded assertions use an LLM to grade outputs, which means cost doubles per test and the grader itself can be wrong. Pin the grader to a strong model (gpt-4o or claude-3-5-sonnet), cache aggressively with --no-cache=false, and spot-check failures manually for the first few runs.

Config files grow fast. Split tests into separate YAMLs and include them with tests: file://tests/*.yaml, and store expensive fixtures in vars files so you’re not pasting 500-line prompts into the main config. Commit the SQLite database to keep a history if you don’t have a shared backend.

Who it’s for

Engineers who treat prompts as production code and want a Jest-style workflow for them. Also security teams running red-team exercises — the built-in attack library is genuinely useful and saves weeks of manual work.

Advertisement

Found this useful?Email