AI & LLMs · Guide · AI & Prompt Tools
How to Use LM Studio
Pick models from the catalog, chat in the desktop UI, and expose an OpenAI-compatible server on port 1234 with GPU offloading. Free to run locally.
LM Studio is a desktop GUI for running local LLMs — download weights from a built-in Hugging Face browser, chat with them in a clean UI, and expose an OpenAI-compatible server on localhost. This guide covers a working setup on a typical developer laptop.
Advertisement
What LM Studio is
LM Studio is an Electron app that wraps llama.cpp (and optionally MLX on Apple Silicon) with a polished UI. It handles model discovery, downloads, GPU offload config, chat templates, and serving through a single window. If Ollama is the CLI/server experience, LM Studio is the desktop-client experience — and the two coexist fine on the same machine.
It is free for personal use. Commercial use requires filling out a form on their site; read the latest terms before shipping it to coworkers.
Install and first launch
Download the installer for macOS, Windows, or Linux from lmstudio.ai. On first launch it will ask which runtime to use — pick the CUDA build on NVIDIA, Metal on Apple Silicon, or the Vulkan/ROCm build on AMD. The app self-updates the runtime from within Settings.
Check the Hardware tab under Settings. It should detect your GPU and show available VRAM. If it does not, your drivers are likely out of date — fix that before loading a model.
Downloading and loading a model
Hit the magnifying-glass icon to open the model search. Type something like llama-3.1-8b-instructand LM Studio surfaces GGUF quantizations from Hugging Face. Each result shows download size and a green/yellow /red badge for whether it will fit in your RAM + VRAM.
For a 16GB MacBook, Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf is a good first pick. Download it, then click the Chat tab and select it from the top dropdown. The first load takes a few seconds while weights stream into GPU memory.
Using the local server
Click the green Developer tab on the left sidebar and toggle Status: Running. LM Studio now exposes an OpenAI-compatible API at http://localhost:1234/v1:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b-instruct",
"messages": [{"role": "user", "content": "ping"}]
}'From Python, use the OpenAI SDK with base_url="http://localhost:1234/v1" and any non-empty API key. Structured outputs and tool-calling work for models that were fine-tuned for them.
GPU offload and context length
In the right-side configuration panel, the GPU Offload slider controls how many transformerlayers run on the GPU. Push it to max if VRAM allows; if you OOM at load time, back off a few layers. TheContext Length field sets the KV-cache window — larger contexts eat memory quadratically in some kernels, so start at 4096 and raise only if you actually need it.
Enable Flash Attention when available — it cuts memory and speeds up long contexts. On Apple Silicon, try the MLX runtime variants of models for measurably faster token throughput than GGUF.
When LM Studio is the wrong choice
LM Studio is great on a workstation but a bad fit for headless servers (it is a GUI app) and for automation pipelines where you want models defined in code. It is also closed-source, which matters if you need to audit the stack. For servers, use Ollama or llama.cpp directly. For desktop use and quickly A/B-testing models, LM Studio is the fastest path from zero to a running local LLM.
Use these while you read
Tools that pair with this guide
- LLM Context Window CalculatorCheck if your tokens fit GPT-4o, Claude, Gemini, Llama, or Mistral context windows — see headroom and percent used. Free, instant, browser-only.AI & Prompt Tools
- AI Prompt GeneratorTurn a vague idea into a structured prompt. Pick role, task, context, constraints, and output format. Works with ChatGPT, Claude, and Gemini.AI & Prompt Tools
- AI Token CounterEstimate tokens, characters, words, and approximate API cost for GPT-4o, GPT-4, Claude, and Gemini — before you hit send.AI & Prompt Tools
- AI Prompt LibraryBrowse a curated catalog of prompt templates for writing, coding, marketing, and research. One click to copy.AI & Prompt Tools
Advertisement
Continue reading
- AI & LLMsGitHub Copilot Pricing and ComparisonCompare free vs paid GitHub Copilot tiers and analyze it against ChatGPT, Cursor, and Tabnine. Find the best value plan instantly with this free online guide.
- AI & LLMsGitHub Copilot Features and CapabilitiesTest what Copilot really does — code accuracy, scope limits, debugging, web dev, legacy code, tests, docs, team customization. Free guide, no sign-up.
- AI & LLMsGitHub Copilot Security and Data HandlingAudit where your code goes, who sees it, training-data policy, network needs, and what happens when Copilot suggests broken code. Free, no sign-up.
- AI & LLMsAI Fluency SkillsThe 8 sub-skills of AI fluency: prompt structure, model selection, tool use, quality calibration, iteration, context management, cost awareness, privacy.
- AI & LLMsAnthropic Skills ExplainedSkills as Anthropic's answer to Custom GPTs — markdown-defined, version-controlled in git, work in terminal. Anatomy + Skills vs Custom GPTs.
- AI & LLMsKimi K2 vs DeepSeek V3Two open-weight Chinese flagships. Kimi K2 = 1M context, DeepSeek V3.2 = top-tier reasoning + coding. Pick by use case.