How-To & Life · Guide · Text & Writing Utilities
How to Reverse Text
Character-level vs word-level reversal, Unicode complications (combining characters, emoji), RTL scripts, and palindrome checks.
Reversing a string is one of the first problems in any programming tutorial, which creates the illusion that it’s trivial. It isn’t. “Reverse this text” can mean reverse every character, reverse every word, reverse the bytes, or reverse the user-perceived characters — and each of those gives a different answer once you include emoji, combining diacritics, or right-to-left scripts. This guide covers the four common reversal definitions, why a naive character-by-character loop breaks modern Unicode text, and how to handle edge cases like flag emoji, Thai vowel marks, and Hebrew. You’ll also learn the palindrome-checking application and the gotchas specific to that.
Advertisement
Four definitions of “reverse”
Before writing code, pick a definition:
- Character reversal — last char first, first char last
- Word reversal — word order flips, word spelling stays
- Byte reversal — rarely what you want, shown for completeness
- Grapheme reversal — what users actually expect with emoji/diacritics
Character reversal: the naive version
The textbook approach works for pure ASCII:
"hello" -> "olleh"
// JS
[...str].reverse().join("")
// Python
str[::-1]This works until the input has anything beyond the basic multilingual plane.
Why naive reversal breaks on emoji
Many emoji are stored as surrogate pairs in UTF-16 — two code units forming one user-perceived character. JavaScript’sstr.split(””) splits at the code-unit level, which splits surrogate pairs apart.
// Broken:
"a\uD83D\uDE00b".split("").reverse().join("")
// produces garbled surrogate order
// Correct:
[..."a\uD83D\uDE00b"].reverse().join("")
// spread uses iterator, which respects code pointsCombining characters: the deeper problem
Even code-point iteration isn’t enough. An “é” can be one code point (U+00E9) or two (U+0065 + U+0301 combining acute). If you reverse the two-code-point form, the accent ends up on the wrong letter.
"cafe\u0301" reversed naively -> "\u0301efac" // the combining mark now attaches to whatever // was before it, not to "e"
The fix: split by grapheme clusters, not code points. UseIntl.Segmenter in modern JS or the regexpackage in Python.
Grapheme-safe reversal
// JS
const seg = new Intl.Segmenter("en", { granularity: "grapheme" });
const graphemes = [...seg.segment(str)].map(s => s.segment);
const reversed = graphemes.reverse().join("");
// Python
import regex
graphemes = regex.findall(r"\X", str)
reversed = "".join(graphemes[::-1])This handles flag emoji (regional-indicator pairs), skin-tone modifiers, ZWJ sequences (family emoji), and combining marks correctly.
Word reversal
Word-level reversal flips the order of tokens without reversing each token. “The quick brown fox” becomes “fox brown quick The.”
str.split(/\s+/).reverse().join(" ")Watch the whitespace handling — double spaces, tabs, newlines. Decide whether you want to preserve exact whitespace or normalize.
Right-to-left scripts
Arabic, Hebrew, and Persian already display right-to-left. Reversing them at the character level produces text that displays in left-to-right order, which looks “forward” to an LTR reader but is actually a scrambled string. Reversing is almost never what you want for RTL content. If you’re rendering a mixed-script sentence, the Unicode bidirectional algorithm handles visual order separately from storage order — don’t fight it.
Byte reversal
Reversing raw UTF-8 bytes produces invalid UTF-8 in almost every case and should be avoided unless you’re doing low-level work on ASCII-only data. The multi-byte continuation bytes will end up in positions where lead bytes belong.
Palindrome checking
The classic application. Canonical workflow:
- Lowercase
- Strip punctuation and whitespace
- Normalize Unicode (NFC)
- Compare to its grapheme-reversed self
function isPalindrome(s) {
const norm = s.toLowerCase()
.normalize("NFC")
.replace(/[^\p{L}\p{N}]/gu, "");
const seg = new Intl.Segmenter("en", { granularity: "grapheme" });
const graphs = [...seg.segment(norm)].map(x => x.segment);
return graphs.join("") === graphs.slice().reverse().join("");
}Line-by-line reversal
A different kind of “reverse”: keep each line intact but put the last line first. Useful for chronological log files.
str.split("\n").reverse().join("\n")Performance notes
For strings under ~10 KB, grapheme segmentation is fast enough that you shouldn’t worry. For multi-MB inputs, iterator-based approaches beat splitting the whole thing into an array. Streaming grapheme segmentation requires buffer handling because a grapheme can span chunks.
Common mistakes
Splitting on empty string and reversing — breaks emoji and combining marks. Reversing then lowercasing palindromes (do it in the other order — case changes in some scripts change the code-point count). Forgetting to normalize before comparing, so “café” and “cafe + combining accent” compare unequal. Expecting meaningful output from reversing RTL text.
Run the numbers
Advertisement