Skip to content
Free Tool Arena

AI & LLMs · Guide · AI & Prompt Tools

How to Use Promptfoo

Write promptfooconfig.yaml assertions, run adversarial tests, and view results in the web UI. Integrate prompt evaluation into your pipeline online for free.

By FreeToolArena Staff · Updated June 2026 · 6 min read

Promptfoo is a CLI that treats prompts like code — YAML tests, assertions, diffs, and CI-friendly output.

Advertisement

Promptfoo is what unit tests look like for LLMs. You declare prompts, test cases, and assertions in a YAML file, run promptfoo eval, and get a side-by-side grid of outputs with pass/fail scoring. It plugs into CI, supports red-teaming, and speaks nearly every model provider natively.

What it is

A Node.js CLI and web viewer. It loads a config, fans out requests across providers and prompt variants, runs deterministic (contains, regex, equals) and model-graded (llm-rubric, similar) assertions, and writes results to a local SQLite database. The viewer renders diffs and lets you share results.

Install / set up

# global install
npm install -g promptfoo
promptfoo init
export OPENAI_API_KEY=sk-...

First run

promptfoo init creates a promptfooconfig.yaml with a sample prompt and two test cases. Run promptfoo eval and it executes every combination of prompts, providers, and tests, then opens a browser view of the results grid.

$ promptfoo eval
[==================] 8/8 complete
$ promptfoo view
Open http://localhost:15500

Everyday workflows

  • Compare GPT-4o, Claude, and a local Llama on the same test set to pick the cheapest model that still passes.
  • Gate pull requests with promptfoo eval --assert in CI so prompt regressions never ship.
  • Run promptfoo redteam to generate adversarial inputs (jailbreaks, PII leaks, prompt injection) against your app.

Gotchas and tips

Model-graded assertions use an LLM to grade outputs, which means cost doubles per test and the grader itself can be wrong. Pin the grader to a strong model (gpt-4o or claude-3-5-sonnet), cache aggressively with --no-cache=false, and spot-check failures manually for the first few runs.

Config files grow fast. Split tests into separate YAMLs and include them with tests: file://tests/*.yaml, and store expensive fixtures in vars files so you’re not pasting 500-line prompts into the main config. Commit the SQLite database to keep a history if you don’t have a shared backend.

Who it’s for

Engineers who treat prompts as production code and want a Jest-style workflow for them. Also security teams running red-team exercises — the built-in attack library is genuinely useful and saves weeks of manual work.

Use these while you read

Tools that pair with this guide

Advertisement

Found this useful?EmailBuy Me a Coffee

Continue reading

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Found this useful?

The tools stay free thanks to readers who chip in or spread the word.

Buy Me a Coffee