AI & LLMs · Guide · AI & Prompt Tools

How to Use Haystack

Combine document stores, retrievers, and generators into custom NLP pipelines. Deploy Haystack to production instantly with this online guide.

By FreeToolArena Staff · Updated June 2026 · 6 min read

Haystack is deepset’s open-source Python framework for building production-grade LLM pipelines — RAG, agents, and search — with a clear component model.

Haystack has been around since before the ChatGPT era, when it focused on neural search. Haystack 2.0 (released 2024) modernised the API around typed components and pipelines, and it’s now one of the most production-focused alternatives to LangChain or LlamaIndex.

What it is

Pipelines are directed graphs of Components (retrievers, generators, rankers, converters) with typed input/output sockets. Document Stores (Elasticsearch, Weaviate, Qdrant, pgvector, OpenSearch, in-memory) hold the indexed content. Haystack ships first-party integrations for every major model provider and vector DB, plus a serverless option via deepset Cloud.

Install / sign up

# Core
pip install haystack-ai

# Integrations are separate packages
pip install qdrant-haystack anthropic-haystack

# Optional managed UI
# https://cloud.deepset.ai

First session

A minimal RAG pipeline has three components: an embedding retriever, a prompt builder, and a generator. Wire them together and call run().

$ python
from haystack import Pipeline
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

p = Pipeline()
p.add_component("retriever", InMemoryEmbeddingRetriever(store))
p.add_component("prompt", PromptBuilder(template=tmpl))
p.add_component("llm", OpenAIGenerator(model="gpt-4o"))
p.connect("retriever", "prompt.documents")
p.connect("prompt", "llm")
print(p.run({"retriever": {"query_embedding": emb}}))

Everyday workflows

1. Build a document-grounded Q&A service over your company’s wiki and Confluence exports.
2. Add a Ranker component after retrieval to boost precision before hitting the LLM.
3. Deploy pipelines behind Hayhooks (FastAPI wrapper) for a REST endpoint you can scale with Kubernetes.

Gotchas and tips

Haystack’s strength is that pipelines are serialisable YAML, which makes diffs and CI review easy. Keep prompts in templates, not hard-coded strings, so you can iterate without redeploying. Evaluation components (AnswerExactMatch, SASEvaluator) slot into the same pipeline graph so you can test in CI.

For very large corpora, favour Elasticsearch or OpenSearch document stores over in-memory — the InMemoryDocumentStore is great for tutorials but not production. Streaming responses require the streaming_callback parameter on generators; it’s easy to miss and it changes how you consume output.

Who it’s for

Teams shipping RAG or search-centric LLM products who want a typed, observable, deployable framework rather than a notebook-style toolkit.

Use these while you read

Tools that pair with this guide

Found this useful?Email Buy Me a Coffee

Continue reading

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →