AI & LLMs · Guide · AI & Prompt Tools
How to Use Tabby
Deploy a self-hosted, open-source coding copilot instantly using Tabby and Docker. Get free, private AI code completions with no registration, straight from your browser.
Tabby is a self-hosted, open-source alternative to GitHub Copilot — you run the inference server on your own GPU and get private code completions across every editor.
Advertisement
Built by TabbyML, Tabby packages a Rust server, a curated set of small code models, and IDE extensions into a single Docker image. It’s popular with teams that can’t send source to third-party clouds but still want inline AI completions and repo-aware chat.
What it is
Tabby ships three things: an inference server (Rust, llama.cpp backend) that serves models like StarCoder2 or DeepSeek-Coder; editor plugins for VS Code, JetBrains, Vim, and Emacs; and a web Code Browser that indexes your Git repos for RAG-style chat. It runs on CPU, CUDA, ROCm, or Apple Metal.
Install / sign up
# Docker with NVIDIA GPU docker run -it --gpus all \ -p 8080:8080 -v $HOME/.tabby:/data \ tabbyml/tabby \ serve --model StarCoder2-3B --device cuda # Visit http://localhost:8080 and create the admin account
First session
Once the server is up, install the VS Code extension, point it at http://localhost:8080, and paste the token from the admin UI. Start typing — grey completions appear within a few hundred milliseconds on a mid-range GPU.
$ code --install-extension TabbyML.vscode-tabby # In settings, set tabby.endpoint = http://localhost:8080 # Start typing a function signature, completions stream in
Everyday workflows
- 1. Connect a GitHub/GitLab repo in the admin UI so chat answers cite your own code.
- 2. Swap models from the Models tab — bigger models for servers, Qwen2.5-Coder-1.5B for laptops.
- 3. Use the Answer Engine tab for repo-wide questions like “where do we hash passwords” with file citations.
Gotchas and tips
VRAM is the binding constraint: a 3B model fits in 6GB, 7B models want 12GB+, and anything larger benefits from int4 quantisation. Set TABBY_MODEL_CACHE_ROOT to a fast SSD to avoid re-downloading weights on every container restart.
Tabby’s default telemetry is anonymous and easy to disable with --no-usage-tracking. For a team deployment, put it behind an OAuth proxy — the built-in auth is token-based and not meant for internet exposure.
Who it’s for
Security-conscious teams, regulated industries, air-gapped shops, and hobbyists who want Copilot-style completions without a SaaS subscription.
Use these while you read
Tools that pair with this guide
- System Prompt BuilderCompose a focused system prompt from a role, tone, constraints, and output format — copy-ready for any LLM.AI & Prompt Tools
- AI Prompt GeneratorTurn a vague idea into a structured prompt. Pick role, task, context, constraints, and output format. Works with ChatGPT, Claude, and Gemini.AI & Prompt Tools
- AI Token CounterEstimate tokens, characters, words, and approximate API cost for GPT-4o, GPT-4, Claude, and Gemini — before you hit send.AI & Prompt Tools
- AI Prompt LibraryBrowse a curated catalog of prompt templates for writing, coding, marketing, and research. One click to copy.AI & Prompt Tools
Advertisement
Continue reading
- AI & LLMsGitHub Copilot Pricing and ComparisonCompare free vs paid GitHub Copilot tiers and analyze it against ChatGPT, Cursor, and Tabnine. Find the best value plan instantly with this free online guide.
- AI & LLMsGitHub Copilot Features and CapabilitiesTest what Copilot really does — code accuracy, scope limits, debugging, web dev, legacy code, tests, docs, team customization. Free guide, no sign-up.
- AI & LLMsGitHub Copilot Security and Data HandlingAudit where your code goes, who sees it, training-data policy, network needs, and what happens when Copilot suggests broken code. Free, no sign-up.
- AI & LLMsAI Fluency SkillsThe 8 sub-skills of AI fluency: prompt structure, model selection, tool use, quality calibration, iteration, context management, cost awareness, privacy.
- AI & LLMsAnthropic Skills ExplainedSkills as Anthropic's answer to Custom GPTs — markdown-defined, version-controlled in git, work in terminal. Anatomy + Skills vs Custom GPTs.
- AI & LLMsKimi K2 vs DeepSeek V3Two open-weight Chinese flagships. Kimi K2 = 1M context, DeepSeek V3.2 = top-tier reasoning + coding. Pick by use case.