Glossary · Definition
Regex
Regex (regular expressions) is a notation for describing patterns in text — used for searching, matching, replacing, splitting, and validating. Every language has a regex engine; the syntax mostly overlaps but has gotchas.
Definition
Regex (regular expressions) is a notation for describing patterns in text — used for searching, matching, replacing, splitting, and validating. Every language has a regex engine; the syntax mostly overlaps but has gotchas.
What it means
Core operators: literal characters, character classes ([abc], [a-z], \d for digits, \w for word chars, \s for whitespace), quantifiers (*, +, ?, {n,m}), anchors (^, $, \b), groups ((...), (?:...) non-capturing, (?<name>...) named), alternation (|), and lookahead/lookbehind ((?=...), (?<=...)). Engines diverge on Unicode (\p{L} for any Unicode letter — supported in modern JS, PCRE, .NET; not POSIX), backtracking depth limits, and named-group syntax. JavaScript regex is mostly Perl-compatible (PCRE) since ES2018 added lookbehind + named groups.
Advertisement
Why it matters
Regex is the right tool for: validating shape (email, URL, phone — but be skeptical of 'perfect email regex'), extracting structured data from text logs, find-and-replace at scale, simple parsing of well-defined formats. It's the WRONG tool for: parsing HTML / JSON / nested structures (use a proper parser — see the famous Stack Overflow answer), matching natural language, anything where the input might trigger catastrophic backtracking ((a+)+$ on long input freezes the regex engine — known as ReDoS).
Frequently asked questions
Why is `parsing HTML with regex` bad?
HTML allows arbitrary nesting; regex can't recurse. You'll always find an edge case (commented-out tags, attribute quotes, nested elements) that breaks naive regex. Use a real parser: cheerio, BeautifulSoup, jsdom.
What's catastrophic backtracking?
When a regex's quantifier nesting causes exponential time on certain inputs. Classic case: `(a+)+$` against `aaaaaaaaaaaaaaaaaaaaaaaaaaaaa!`. The engine tries every possible split before failing. Mitigation: avoid nested quantifiers, use atomic groups, or set a regex timeout.
Tools for building regex?
regex101.com, regexr.com, and our own /tools/regex-tester. Test against representative inputs before shipping.