Skip to content
Free Tool Arena

AI & LLMs · Guide · AI & Prompt Tools

How to Use Tabby

Deploying Tabby via Docker, model selection, GPU inference, Code Browser, IDE extensions, team setup.

Updated April 2026 · 6 min read

Tabby is a self-hosted, open-source alternative to GitHub Copilot — you run the inference server on your own GPU and get private code completions across every editor.

Advertisement

Built by TabbyML, Tabby packages a Rust server, a curated set of small code models, and IDE extensions into a single Docker image. It’s popular with teams that can’t send source to third-party clouds but still want inline AI completions and repo-aware chat.

What it is

Tabby ships three things: an inference server (Rust, llama.cpp backend) that serves models like StarCoder2 or DeepSeek-Coder; editor plugins for VS Code, JetBrains, Vim, and Emacs; and a web Code Browser that indexes your Git repos for RAG-style chat. It runs on CPU, CUDA, ROCm, or Apple Metal.

Install / sign up

# Docker with NVIDIA GPU
docker run -it --gpus all \
  -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve --model StarCoder2-3B --device cuda

# Visit http://localhost:8080 and create the admin account

First session

Once the server is up, install the VS Code extension, point it at http://localhost:8080, and paste the token from the admin UI. Start typing — grey completions appear within a few hundred milliseconds on a mid-range GPU.

$ code --install-extension TabbyML.vscode-tabby
# In settings, set tabby.endpoint = http://localhost:8080
# Start typing a function signature, completions stream in

Everyday workflows

  • 1. Connect a GitHub/GitLab repo in the admin UI so chat answers cite your own code.
  • 2. Swap models from the Models tab — bigger models for servers, Qwen2.5-Coder-1.5B for laptops.
  • 3. Use the Answer Engine tab for repo-wide questions like “where do we hash passwords” with file citations.

Gotchas and tips

VRAM is the binding constraint: a 3B model fits in 6GB, 7B models want 12GB+, and anything larger benefits from int4 quantisation. Set TABBY_MODEL_CACHE_ROOT to a fast SSD to avoid re-downloading weights on every container restart.

Tabby’s default telemetry is anonymous and easy to disable with --no-usage-tracking. For a team deployment, put it behind an OAuth proxy — the built-in auth is token-based and not meant for internet exposure.

Who it’s for

Security-conscious teams, regulated industries, air-gapped shops, and hobbyists who want Copilot-style completions without a SaaS subscription.

Advertisement

Found this useful?Email