Skip to content
Free Tool Arena

Developers & Technical · Guide · Developer Utilities

Regex Cheat Sheet: All Patterns Explained

Complete regex reference: every operator, flavor differences (ECMAScript, PCRE, Python, Go), and 30 patterns covering 95% of real matching tasks.

By FreeToolArena Staff · Updated May 2026 · 6 min read

A complete regex reference: every operator, the difference between flavors (ECMAScript, PCRE, Python, Go RE2), and 30 patterns that cover ~95% of real-world matching tasks. Each pattern is shown with input, output, and a flavor compatibility note. Use this as a working reference — bookmark it, search-in-page for what you need, copy and adapt.

Most regex tutorials over-explain syntax and under-explain the engine differences that bite you in production. This guide goes the other way: short syntax recap, long pattern library, and explicit flavor warnings.

Advertisement

Core syntax recap (in 90 seconds)

  • . — any character except newline (use s flag for “dotall” mode where dot matches newlines too).
  • \d — digit. Equivalent to [0-9] in most flavors. Unicode-aware in Python, ECMAScript with u flag.
  • \w — word character. [a-zA-Z0-9_] in most flavors.
  • \s — whitespace (space, tab, newline, etc.).
  • \D \W \S — uppercase = negated.
  • [abc] — character class: a, b, or c. [a-z] — range. [^abc] — negation.
  • | — alternation: cat|dog matches cat OR dog.
  • ? — 0 or 1 occurrences. * — 0 or more. + — 1 or more.
  • {n} — exactly n. {n,} — n or more. {n,m} — between n and m.

Anchors and boundaries

  • ^ — start of string. With m flag, start of line.
  • $ — end of string. With m flag, end of line.
  • \b — word boundary (between \w and \W). \bcat\b matches “cat” but not “catalog”.
  • \B — non-word boundary. \Bcat\B matches “concatenate” but not “cat box”.
  • \A — absolute start of string (Python, PCRE). Not in ECMAScript.
  • \Z / \z — absolute end of string (Python, PCRE). Not in ECMAScript.

Most common gotcha: ^ and $ default to string start/end, not line start/end. To match line by line, add the multiline m flag: /^foo$/m.

Quantifiers: greedy vs lazy vs possessive

Three quantifier strategies in modern regex engines (not all flavors support all three):

  • Greedy (default): match as much as possible, then back off..* on “abc” matches “abc”.
  • Lazy / reluctant: .*?, .+?. Match as little as possible. Useful for “match between delimiters” patterns.
  • Possessive: .*+, .++. Like greedy but never give back. Fail-fast on no-match. Available in PCRE, Java, Ruby; NOT in ECMAScript or Python.

Worked example on <b>hello</b> <b>world</b>:

  • Greedy <b>.*</b> → matches the entire string (one big match).
  • Lazy <b>.*?</b> → matches each <b>...</b> separately.

Character classes and shortcuts

  • [abc] — one of a, b, or c.
  • [a-zA-Z0-9] — alphanumeric ASCII.
  • [^abc] — NOT a, b, or c (one char).
  • [\d.-] — digit, dot, or hyphen. Inside [], most metacharacters lose special meaning. - goes at start/end to be literal.
  • \p{Letter} — Unicode property class: any letter (Greek, Cyrillic, etc.). Requires u flag in ECMAScript.
  • \p{Number} — any Unicode digit (Arabic, Devanagari, etc.).

Groups, captures, backreferences

  • (abc) — capturing group. Accessible as $1 in replace, match[1] in code.
  • (?:abc) — non-capturing group. Same grouping behavior, no capture overhead.
  • (?<name>abc) — named capture. Accessible as match.groups.name.
  • \1 \2 ... — backreference to captured group. (a)\1 matches “aa”.
  • \k<name> — backreference by name.

Worked example: extract user and domain from email. Pattern: (?<user>\w+)@(?<domain>[\w.-]+). On “hello@example.com”:match.groups.user === "hello", match.groups.domain === "example.com".

Lookahead and lookbehind

Zero-width assertions: they check whether a position has certain context, but don’t consume characters.

  • (?=...) — positive lookahead. foo(?=bar) matches “foo” only if followed by “bar”.
  • (?!...) — negative lookahead. foo(?!bar) matches “foo” not followed by “bar”.
  • (?<=...) — positive lookbehind. (?<=foo)bar matches “bar” preceded by “foo”.
  • (?<!...) — negative lookbehind. (?<!foo)bar matches “bar” NOT preceded by “foo”.

Flavor support: Python re requires fixed-width lookbehind; regex module supports variable-width. ECMAScript supports both as of ES2018. Go RE2 has no lookaround at all (linear-time guarantee).

Engine differences (ECMAScript, PCRE, Python, Go)

FeatureECMAScriptPCRE / PerlPython reGo RE2
LookbehindES2018+ (any width)Yes (any)Fixed-width onlyNO
Possessive quantifiersNOYesNONO
Recursion / subroutinesNOYesNONO
Named groups(?<name>)(?P<name>) or (?<name>)(?P<name>)(?P<name>)
BacktrackingYesYesYesNO (linear time)
Unicode property classesWith u flagYesYesYes

Practical implication: a pattern that works in regex101.com’s PCRE mode may fail in your JavaScript code. Always test in the engine you’ll deploy to. The browser regex tester uses ECMAScript exactly as your production code will.

Common patterns: validation

Each pattern is in ECMAScript flavor unless noted. Translate as needed.

Email (pragmatic)

/^[\w.+-]+@[\w-]+\.[\w.-]+$/

Don’t try to match RFC 5321 — the full spec regex is 6,425 characters. The above accepts ~99.9% of real emails and rejects most invalid input. For bullet-proof validation, send a confirmation email instead.

URL (HTTP/HTTPS)

/^https?:\/\/[\w.-]+(?::\d+)?(?:\/[^\s]*)?$/

US phone number

/^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/

Matches: (415) 555-1234, 415-555-1234, 415.555.1234, 4155551234.

IPv4 address

/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/

Strong password (8+ chars, mixed case, digit, special)

/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/

Better approach: skip composition rules entirely, require length 12+, and check against breach databases (HIBP). Modern security guidance has moved away from composition requirements.

ISO 8601 date (YYYY-MM-DD)

/^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/

Hex color

/^#?(?:[0-9a-f]{3}|[0-9a-f]{6})$/i

Slug (URL-safe identifier)

/^[a-z0-9]+(?:-[a-z0-9]+)*$/

Common patterns: extraction

Match between delimiters (lazy)

/<title>(.*?)<\/title>/

Caveat: don’t parse HTML with regex for anything beyond the simplest cases. Use DOMParser instead.

All numbers in a string

/-?\d+(?:\.\d+)?/g

Quoted strings (handles escaped quotes)

/"((?:[^"\\]|\\.)*)"/g

Hashtags from a tweet

/#[\w_]+/g

Markdown link

/\[([^\]]+)\]\(([^)]+)\)/g

Captures: $1 = link text, $2 = URL.

CSV row (simple, no embedded commas)

/[^,\n]+/g

For real CSV with quoted fields and embedded commas, use a CSV parser library.

Common patterns: replacement

Strip HTML tags

text.replace(/<[^>]+>/g, '')

Collapse multiple spaces

text.replace(/\s+/g, ' ').trim()

Convert camelCase to snake_case

text.replace(/([a-z])([A-Z])/g, '$1_$2').toLowerCase()

Mask email middle

email.replace(/^(.{2}).*?(@.*)$/, '$1***$2')

Output: he***@example.com

Convert phone to E.164

text.replace(/[^\d]/g, '').replace(/^/, '+1')

Catastrophic backtracking and ReDoS

Regular Expression Denial of Service (ReDoS) is a real attack class. Vulnerable patterns have nested quantifiers that produce exponential paths on adversarial input. The classic example:

/^(a+)+$/

On input “aaaaaaaaaaaaaaaaaaa!” (no matching $), the regex engine tries every possible split of a characters between the inner and outer quantifier. Time grows as 2^n. 30 a’s = 1 billion paths = 30+ second hang.

Common ReDoS patterns to audit:

  • (a+)+, (a*)* — nested quantifiers on overlapping classes.
  • (a|aa)+ — alternation with overlap.
  • Email regex ^([a-zA-Z0-9._-]+)+@ — nested group with permissive inner quantifier.

Defenses: (1) Use Go RE2 or RE2-compatible engines (Cloudflare, Google’s open-source RE2 library) for untrusted input — linear time guarantee. (2) Add timeouts when running user-supplied patterns. (3) Use static analysis tools (rxxr2, safe-regex) to flag risky patterns. (4) Avoid nested quantifiers; prefer atomic groups (?>...) or possessive quantifiers where supported.

Performance tips

  • Anchor your patterns: ^abc is dramatically faster than abc on long input where matches start at position 0.
  • Prefer character classes to alternation: [abc] is faster than a|b|c.
  • Compile once, reuse many times: in Python re.compile() and Java Pattern.compile(), save the compiled pattern for hot loops. ECMAScript engines cache regex literals automatically.
  • Use non-capturing groups (?:...) when you don’t need the capture — saves memory.
  • Profile before optimizing: most regex performance issues are catastrophic backtracking, not micro-optimization. Use a regex profiler.

Don’t do these

  • Parse HTML with regex. HTML is recursive; regex isn’t. Use DOMParser, BeautifulSoup, jsoup, or html.parser.
  • Parse JSON with regex. Use JSON.parse or your language’s equivalent.
  • Match RFC 5321 emails with one regex. The proper regex is 6,425 chars; nobody actually uses it. Validate format with a pragmatic pattern, then send a confirmation email.
  • Validate SQL identifiers with permissive regex. Use parameterized queries; don’t hand-roll SQL injection prevention.
  • Match balanced delimiters with regex. Recursion is required; most regex engines don’t support it. Use a stack-based parser.
  • Trust user-supplied regex without timeouts. ReDoS will hang your process.

The 80/20 takeaway

Master 6 things and you can handle ~95% of real-world regex tasks: character classes, quantifiers (greedy and lazy), anchors, capture groups, alternation, backreferences. The rest (lookaround, possessive quantifiers, atomic groups) is situational. Test in the exact engine you’ll deploy to (the regex tester uses ECMAScript). Audit any pattern that handles untrusted input for ReDoS. And always have a non-regex fallback ready — HTML parsers, JSON parsers, real CSV libraries — for cases regex can’t handle correctly.

Use these while you read

Tools that pair with this guide

Frequently asked questions

What's the difference between regex flavors?

Major engines: ECMAScript (browsers, Node.js), PCRE (PHP, Perl), Python re, Go RE2, Java's java.util.regex, Ruby. Differences include lookbehind support (Go RE2 has none), recursion (only PCRE/Perl), Unicode handling, possessive quantifiers (PCRE/Java/Ruby only). Same pattern can match in one flavor and fail in another. Always test in the exact engine you'll deploy to.

Why is my regex pattern hanging or timing out?

Likely catastrophic backtracking — a ReDoS pattern. Common culprits: nested quantifiers like (a+)+, alternation with overlap like (a|aa)+, permissive nested groups like ([a-z]+)+. Time grows exponentially with input length. Defenses: (1) rewrite the pattern to remove nested quantifiers, (2) use Go RE2 or RE2-compatible engines for untrusted input (linear-time guaranteed), (3) add execution timeouts, (4) run static analyzers like safe-regex to flag risky patterns.

How do I write a regex for emails properly?

Don't try for RFC 5321 perfection — the canonical regex is 6,425 characters. Use a pragmatic pattern like /^[\w.+-]+@[\w-]+\.[\w.-]+$/ that catches ~99.9% of real emails and rejects most invalid input. For high-stakes validation (signup forms): pragmatic regex first to filter typos, then send a confirmation email — only the inbox owner can click the link, which proves both syntactic AND deliverable validity. Don't combine validation regex with deliverability checks; separate concerns.

What's the fastest regex engine?

Go RE2 is the fastest for guaranteed worst-case performance (linear time, no catastrophic backtracking). It's used by Cloudflare, Google, and many search engines. Trade-off: no lookbehind, no recursion. For features-rich speed: PCRE2 with JIT compilation is fastest. ECMAScript engines (V8 in Node/Chrome) are fast for most patterns due to heavy optimization but vulnerable to ReDoS on adversarial input. Python re is consistently the slowest of major engines; the third-party 'regex' module is meaningfully faster.

Advertisement

Found this useful?Email

Continue reading

100% in-browserNo downloadsNo sign-upMalware-freeHow we keep this safe →