AI & LLMs · Guide · AI & Prompt Tools
How to Use Tabby
Deploying Tabby via Docker, model selection, GPU inference, Code Browser, IDE extensions, team setup.
Tabby is a self-hosted, open-source alternative to GitHub Copilot — you run the inference server on your own GPU and get private code completions across every editor.
Advertisement
Built by TabbyML, Tabby packages a Rust server, a curated set of small code models, and IDE extensions into a single Docker image. It’s popular with teams that can’t send source to third-party clouds but still want inline AI completions and repo-aware chat.
What it is
Tabby ships three things: an inference server (Rust, llama.cpp backend) that serves models like StarCoder2 or DeepSeek-Coder; editor plugins for VS Code, JetBrains, Vim, and Emacs; and a web Code Browser that indexes your Git repos for RAG-style chat. It runs on CPU, CUDA, ROCm, or Apple Metal.
Install / sign up
# Docker with NVIDIA GPU docker run -it --gpus all \ -p 8080:8080 -v $HOME/.tabby:/data \ tabbyml/tabby \ serve --model StarCoder2-3B --device cuda # Visit http://localhost:8080 and create the admin account
First session
Once the server is up, install the VS Code extension, point it at http://localhost:8080, and paste the token from the admin UI. Start typing — grey completions appear within a few hundred milliseconds on a mid-range GPU.
$ code --install-extension TabbyML.vscode-tabby # In settings, set tabby.endpoint = http://localhost:8080 # Start typing a function signature, completions stream in
Everyday workflows
- 1. Connect a GitHub/GitLab repo in the admin UI so chat answers cite your own code.
- 2. Swap models from the Models tab — bigger models for servers, Qwen2.5-Coder-1.5B for laptops.
- 3. Use the Answer Engine tab for repo-wide questions like “where do we hash passwords” with file citations.
Gotchas and tips
VRAM is the binding constraint: a 3B model fits in 6GB, 7B models want 12GB+, and anything larger benefits from int4 quantisation. Set TABBY_MODEL_CACHE_ROOT to a fast SSD to avoid re-downloading weights on every container restart.
Tabby’s default telemetry is anonymous and easy to disable with --no-usage-tracking. For a team deployment, put it behind an OAuth proxy — the built-in auth is token-based and not meant for internet exposure.
Who it’s for
Security-conscious teams, regulated industries, air-gapped shops, and hobbyists who want Copilot-style completions without a SaaS subscription.
Advertisement