AI & LLMs · Guide · AI & Prompt Tools

How to Use GPT4All

Download GPT4All, load open-source LLMs, and create a private RAG pipeline over your documents using SBert embeddings—completely offline and free.

By FreeToolArena Staff · Updated June 2026 · 6 min read

GPT4All is a desktop client from Nomic AI for running open-source LLMs locally on commodity hardware. It bundles model discovery, chat, and a local document-retrieval feature called LocalDocs into a single free application.

What GPT4All is

GPT4All started in 2023 as one of the earliest easy-to-use local LLM apps and has since matured into a stable cross-platform client. It wraps llama.cpp for inference, maintains a curated catalog of GGUF models, and ships LocalDocs — a RAG feature that indexes folders of PDFs, markdown, code, and office docs into a local vector store. The project is MIT-licensed with commercial use allowed.

Compared to LM Studio or Jan, GPT4All leans heavier into “chat with your files” as the default workflow rather than just raw chat.

Installing GPT4All

Grab the installer for macOS, Windows, or Ubuntu from nomic.ai/gpt4all. The installer is a straightforward wizard; on Linux you can also use the provided .run file. First launch prompts you to opt in or out of anonymous telemetry — decline if you want it fully offline.

Models download into ~/Library/Application Support/nomic.ai/GPT4All/ on macOS and equivalent paths on Windows and Linux. Point that at an external drive via symlink if disk space is tight.

Picking and downloading a model

Open the Models tab. GPT4All surfaces a short list of battle-tested GGUF models with size and RAM requirements clearly labeled. Good starting picks:

Llama 3.1 8B Instruct — general-purpose, needs ~8GB RAM
Qwen 2.5 Coder 7B — code assistance, similar memory
Phi-3 Mini 4K — runs on 8GB machines with headroom
Mistral 7B Instruct — fast and reliable baseline

Click Download and watch the progress bar. Switch to the Chats tab and pick the model from the top-right dropdown to start a session.

Using LocalDocs for private RAG

LocalDocs is the killer feature. In the LocalDocs tab, click + Add Collection, name it, and point it at a folder of documents. GPT4All scans supported file types (PDF, DOCX, TXT, MD, source code), chunks them, and embeds them locally using a built-in Nomic Embed model.

In a chat thread, toggle the collection on via the database icon. Queries now retrieve relevant chunks from your documents before generating. The sidebar shows citations so you can verify the model did not hallucinate. Nothing leaves your machine.

API access and configuration

Open Settings → Application → API Server and flip it on. GPT4All exposes an OpenAI-compatible endpoint at http://localhost:4891/v1:

curl http://localhost:4891/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama 3.1 8B Instruct",
    "messages": [{"role": "user", "content": "ping"}]
  }'

Under Settings → Model, you can tune temperature, top-k, top-p, repeat penalty, and context length per model. If you have an NVIDIA GPU or Apple Silicon, enable GPU in Settings → Application — CPU-only is slow on 7B+ models.

When GPT4All is the wrong choice

GPT4All is great for privacy-focused desktop use and for non-technical teammates who need a no-config “ chat with my PDFs” tool. It is not designed for production serving, multi-user deployment, or rapid model experimentation — its curated catalog is narrower than LM Studio’s Hugging Face browser. For servers, reach for Ollama. For raw breadth of models, LM Studio. For a polished local RAG out of the box, GPT4All is hard to beat.

Use these while you read

Tools that pair with this guide

Found this useful?Email Buy Me a Coffee

Continue reading

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →