AI & LLMs · Guide · AI & Prompt Tools
How to Use GPT4All
Installing GPT4All, downloading models, using LocalDocs for private RAG over your files, embedding with SBert.
GPT4All is a desktop client from Nomic AI for running open-source LLMs locally on commodity hardware. It bundles model discovery, chat, and a local document-retrieval feature called LocalDocs into a single free application.
Advertisement
What GPT4All is
GPT4All started in 2023 as one of the earliest easy-to-use local LLM apps and has since matured into a stable cross-platform client. It wraps llama.cpp for inference, maintains a curated catalog of GGUF models, and ships LocalDocs — a RAG feature that indexes folders of PDFs, markdown, code, and office docs into a local vector store. The project is MIT-licensed with commercial use allowed.
Compared to LM Studio or Jan, GPT4All leans heavier into “chat with your files” as the default workflow rather than just raw chat.
Installing GPT4All
Grab the installer for macOS, Windows, or Ubuntu from nomic.ai/gpt4all. The installer is a straightforward wizard; on Linux you can also use the provided .run file. First launch prompts you to opt in or out of anonymous telemetry — decline if you want it fully offline.
Models download into ~/Library/Application Support/nomic.ai/GPT4All/ on macOS and equivalent paths on Windows and Linux. Point that at an external drive via symlink if disk space is tight.
Picking and downloading a model
Open the Models tab. GPT4All surfaces a short list of battle-tested GGUF models with size and RAM requirements clearly labeled. Good starting picks:
Llama 3.1 8B Instruct— general-purpose, needs ~8GB RAMQwen 2.5 Coder 7B— code assistance, similar memoryPhi-3 Mini 4K— runs on 8GB machines with headroomMistral 7B Instruct— fast and reliable baseline
Click Download and watch the progress bar. Switch to the Chats tab and pick the model from the top-right dropdown to start a session.
Using LocalDocs for private RAG
LocalDocs is the killer feature. In the LocalDocs tab, click + Add Collection, name it, and point it at a folder of documents. GPT4All scans supported file types (PDF, DOCX, TXT, MD, source code), chunks them, and embeds them locally using a built-in Nomic Embed model.
In a chat thread, toggle the collection on via the database icon. Queries now retrieve relevant chunks from your documents before generating. The sidebar shows citations so you can verify the model did not hallucinate. Nothing leaves your machine.
API access and configuration
Open Settings → Application → API Server and flip it on. GPT4All exposes an OpenAI-compatible endpoint at http://localhost:4891/v1:
curl http://localhost:4891/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Llama 3.1 8B Instruct",
"messages": [{"role": "user", "content": "ping"}]
}'Under Settings → Model, you can tune temperature, top-k, top-p, repeat penalty, and context length per model. If you have an NVIDIA GPU or Apple Silicon, enable GPU in Settings → Application — CPU-only is slow on 7B+ models.
When GPT4All is the wrong choice
GPT4All is great for privacy-focused desktop use and for non-technical teammates who need a no-config “ chat with my PDFs” tool. It is not designed for production serving, multi-user deployment, or rapid model experimentation — its curated catalog is narrower than LM Studio’s Hugging Face browser. For servers, reach for Ollama. For raw breadth of models, LM Studio. For a polished local RAG out of the box, GPT4All is hard to beat.
Advertisement