AI & LLMs · Guide · AI & Prompt Tools
How to Use Hermes Models
Master Hermes 3 system prompts and function-calling syntax to extract reliable structured data from the Nous Research Llama tune. Free online walkthrough.
Hermes is Nous Research’s family of open-weight fine-tunes built on top of Meta’s Llama base models. This guide covers what Hermes 3 is actually good at, how to pick a size, and how to run it locally alongside your existing LLM stack.
Advertisement
What Hermes models are
Hermes 3 is Nous Research’s flagship fine-tune series, released in sizes matching the Llama 3.1 base (8B, 70B, 405B). Nous specializes in instruction-following, function calling, structured outputs, long-context reliability, and preserving steerability — Hermes models tend to refuse less than stock Llama-Instruct and follow system prompts more literally.
The weights are Llama-3.1-licensed (inherited from Meta), so you can use them commercially under the usual Llama terms. They publish on Hugging Face under NousResearch/Hermes-3-Llama-3.1-*.
Picking the right size
Choose based on your hardware and task:
- Hermes 3 8B — runs on a 16GB laptop at Q4. Good agent/assistant quality, better function-calling than stock Llama 3.1 Instruct.
- Hermes 3 70B — needs serious hardware (48GB+ VRAM at Q4, or a Mac Studio with sufficient unified memory). Competitive with frontier open models on reasoning.
- Hermes 3 405B — datacenter-only. Multi-GPU or quantized heavily on an H100 cluster.
For most local use cases, start with the 8B. It is the pragmatic sweet spot and ships with the same function-calling and structured-output training as its larger siblings.
Running Hermes locally
With Ollama, pull a community GGUF port (or roll your own via llama.cpp’s converter):
ollama pull hermes3:8b ollama run hermes3:8b "You are a terse code reviewer. Review this function: ..."
With llama.cpp directly, download a GGUF and serve it:
huggingface-cli download bartowski/Hermes-3-Llama-3.1-8B-GGUF \ Hermes-3-Llama-3.1-8B-Q4_K_M.gguf --local-dir ./models ./build/bin/llama-server -m ./models/Hermes-3-Llama-3.1-8B-Q4_K_M.gguf \ --host 0.0.0.0 --port 8080 -c 8192 -ngl 99
Using function calling and structured outputs
Hermes 3 uses a specific tool-call format that it was trained on. It emits calls wrapped in <tool_call>...</tool_call> XML tags with JSON payloads. The model card spells out the exact system prompt template — read it before building an agent on top.
For strict JSON output, combine a clear system prompt with llama.cpp’s --grammar flag or a GBNF grammar file to constrain decoding. You will get dramatically more reliable structured outputs than relying on the model alone:
./build/bin/llama-cli -m ./models/hermes-3-8b.gguf \ --grammar-file json.gbnf \ -p "Extract name and age as JSON from: 'Sam is 34.'"
Sampling settings that matter
Hermes benefits from slightly lower temperatures than stock Llama for agentic work. Try temperature=0.4, top_p=0.9, and a mild repeat penalty of 1.05 as a starting point. For creative writing, push temperature up to 0.8–1.0. Context length is inherited from Llama 3.1, so 128k is supported on paper, but quality degrades past ~32k unless your hardware can fit the full KV cache.
When Hermes is the wrong choice
If you are doing code-specific work, Qwen 2.5 Coder or DeepSeek-Coder V2 usually beat Hermes at the same size. If you want the absolute most refusal-free chat model, there are more specialized fine-tunes — though they come with their own risks. For general-purpose assistants, agents, and function-calling workloads on open weights, Hermes 3 is a strong, well-supported default.
Use these while you read
Tools that pair with this guide
- System Prompt BuilderCompose a focused system prompt from a role, tone, constraints, and output format — copy-ready for any LLM.AI & Prompt Tools
- AI Prompt GeneratorTurn a vague idea into a structured prompt. Pick role, task, context, constraints, and output format. Works with ChatGPT, Claude, and Gemini.AI & Prompt Tools
- AI Token CounterEstimate tokens, characters, words, and approximate API cost for GPT-4o, GPT-4, Claude, and Gemini — before you hit send.AI & Prompt Tools
- AI Prompt LibraryBrowse a curated catalog of prompt templates for writing, coding, marketing, and research. One click to copy.AI & Prompt Tools
Advertisement
Continue reading
- AI & LLMsGitHub Copilot Pricing and ComparisonCompare free vs paid GitHub Copilot tiers and analyze it against ChatGPT, Cursor, and Tabnine. Find the best value plan instantly with this free online guide.
- AI & LLMsGitHub Copilot Features and CapabilitiesTest what Copilot really does — code accuracy, scope limits, debugging, web dev, legacy code, tests, docs, team customization. Free guide, no sign-up.
- AI & LLMsGitHub Copilot Security and Data HandlingAudit where your code goes, who sees it, training-data policy, network needs, and what happens when Copilot suggests broken code. Free, no sign-up.
- AI & LLMsAI Fluency SkillsThe 8 sub-skills of AI fluency: prompt structure, model selection, tool use, quality calibration, iteration, context management, cost awareness, privacy.
- AI & LLMsAnthropic Skills ExplainedSkills as Anthropic's answer to Custom GPTs — markdown-defined, version-controlled in git, work in terminal. Anatomy + Skills vs Custom GPTs.
- AI & LLMsKimi K2 vs DeepSeek V3Two open-weight Chinese flagships. Kimi K2 = 1M context, DeepSeek V3.2 = top-tier reasoning + coding. Pick by use case.