Skip to content
Free Tool Arena

Glossary · Definition

Transformer (AI architecture)

Transformer is the neural network architecture introduced in 2017 ('Attention Is All You Need', Vaswani et al.) that powers all modern large language models — GPT, Claude, Gemini, Llama. Built on self-attention, not recurrence.

Updated May 2026 · 4 min read
100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →

Definition

Transformer is the neural network architecture introduced in 2017 ('Attention Is All You Need', Vaswani et al.) that powers all modern large language models — GPT, Claude, Gemini, Llama. Built on self-attention, not recurrence.

What it means

Pre-transformer NLP used RNNs (LSTMs, GRUs) which processed text sequentially. Transformers process all tokens in parallel via self-attention — each token computes weighted relevance to every other token. This parallelism made training on huge datasets feasible. Modern frontier LLMs are 'decoder-only' transformers (GPT-style); older translation models used encoder-decoder. Variants (mixture-of-experts, sparse attention, etc.) optimize the base architecture but the transformer remains the foundation.

Advertisement

Why it matters

The transformer is to AI what the relational model was to databases — the architectural breakthrough that defined the era. Understanding it (at least the attention concept) helps you reason about why LLMs are good at certain tasks (long-range pattern recognition) and bad at others (true compositional reasoning).

Related free tools

Frequently asked questions

Will transformers be replaced?

Possibly, eventually. State-space models (Mamba) showed promise in 2024-2025 but transformers remain dominant in 2026. Don't bet on a near-term replacement.

Encoder vs decoder?

Decoder-only is the modern standard for LLMs (GPT, Claude, Gemini). Encoder-only (BERT) used for embeddings, classification. Encoder-decoder (T5) used for translation.

Related terms