How-To & Life · Guide · Text & Writing Utilities
How to Remove Extra Whitespace
Trimming, collapsing runs of spaces, non-breaking spaces, tab-to-space, and preserving code indentation.
Extra whitespace is the silent ugliness of text pipelines. Trailing spaces break diffs, runs of spaces ruin alignment, tabs mixed with spaces break code, and non-breaking spaces pasted from Word look identical to regular spaces but compare unequal and cause string matches to fail mysteriously. “Remove extra whitespace” is a dozen different operations depending on what you mean: trim, collapse runs, convert tabs, strip invisible variants, or normalize the lot. This guide covers each operation, the regex patterns that actually work, and the cases where you deliberately don’twant to strip — like code indentation.
Advertisement
What counts as whitespace
More than just the space character. The Unicode whitespace class includes:
- Regular space (U+0020)
- Tab (U+0009)
- Line feed, carriage return, form feed, vertical tab
- Non-breaking space (U+00A0) — looks like space, isn’t
- En space, em space, thin space, hair space (U+2000 to U+200A)
- Zero-width space (U+200B) — technically not whitespace in Unicode, but often treated as one
- Ideographic space (U+3000) — full-width space from CJK
Trim
Remove leading and trailing whitespace, leave the middle alone. Every language has a built-in. In regex:
str.replace(/^\s+|\s+$/g, "") // Or JS built-in str.trim(); // both ends str.trimStart(); // leading only str.trimEnd(); // trailing only
JavaScript’s trim uses the Unicode whitespace class, so it handles non-breaking space and the exotic Unicode spaces too.
Collapse runs of whitespace
Replace any run of whitespace with a single space:
str.replace(/\s+/g, " ")
This flattens tabs, multiple spaces, and any line breaks inside. Combine with trim for the classic “clean up this mess” pass:
str.replace(/\s+/g, " ").trim()
Preserve line structure while collapsing intra-line runs
When you want clean lines but still want lines:
str
.split(/\r\n|\r|\n/)
.map(l => l.replace(/[^\S\n]+/g, " ").trim())
.join("\n")[^\\S\\n] is “whitespace that isn’t a newline,” a classic trick.
Non-breaking spaces
NBSP (U+00A0) is the villain of copy-paste workflows. It looks identical to a space in most fonts but:
- Doesn’t match
/ /regex (which matches literal space only) - Doesn’t break lines in HTML rendering
- Breaks naive
split(” ”)tokenization
It does match /\\s/, which is why collapse-runs regex handles it transparently. If you want to preserve NBSP (for typographic reasons) and only collapse regular spaces, be explicit:
str.replace(/ +/g, " ") // only ASCII space str.replace(/\s+/g, " ") // all whitespace
Tab-to-space conversion
Tabs render differently across editors and cause alignment chaos in mixed-indent code. Convert to N spaces:
str.replace(/\t/g, " ") // 2 spaces str.replace(/\t/g, " ") // 4 spaces
For column alignment (tab-expand), you need tab stops:
function expandTabs(str, tabSize = 4) {
return str.split("\n").map(line => {
let out = "";
for (const ch of line) {
if (ch === "\t") {
const pad = tabSize - (out.length % tabSize);
out += " ".repeat(pad);
} else {
out += ch;
}
}
return out;
}).join("\n");
}Preserving code indentation
The one case where you must not collapse leading whitespace. Code has meaning in indent levels (Python especially, but also YAML, Makefile, and anything following line structure). Trim trailing whitespace, collapse runs inside non-indent regions only:
str.split("\n").map(line => {
const indent = line.match(/^[ \t]*/)[0];
const rest = line.slice(indent.length).replace(/[ \t]+/g, " ").trimEnd();
return indent + rest;
}).join("\n");Trailing whitespace per line
The most universally safe cleanup: strip trailing whitespace on every line. Never breaks meaning, cleans up editor artifacts.
str.replace(/[ \ ]+$/gm, "")
The m flag makes $ match at line breaks, not just end-of-string.
Blank-line collapse
Two or more blank lines becomes one:
str.replace(/\n{3,}/g, "\n\n")Three or more newlines means two or more blank lines (because one newline is the end of a line, not a blank line).
Full normalization pipeline
function cleanWhitespace(s) {
return s
.replace(/\r\n?/g, "\n") // normalize line endings
.replace(/[ \t]+$/gm, "") // trim trailing per line
.replace(/\n{3,}/g, "\n\n") // collapse blank lines
.split("\n")
.map(l => l.replace(/[ \t]+/g, " ").trimStart() === ""
? ""
: l)
.join("\n")
.trim();
}Common mistakes
Using / / to match spaces and missing NBSP. Collapsing leading whitespace in code. Stripping all whitespace from CSV fields and losing significant spaces in names. Forgetting to normalize line endings before regex, then missing matches on CRLF files. And stripping trailing whitespace on a file with significant-whitespace languages like Markdown, where two trailing spaces = <br>.
Run the numbers
Whitespace removerLine break removerSpecial character remover
Advertisement