Skip to content
Free Tool Arena

How-To & Life · Guide · Text & Writing Utilities

How to Reverse Text

Character-level vs word-level reversal, Unicode complications (combining characters, emoji), RTL scripts, and palindrome checks.

Updated April 2026 · 6 min read

Reversing a string is one of the first problems in any programming tutorial, which creates the illusion that it’s trivial. It isn’t. “Reverse this text” can mean reverse every character, reverse every word, reverse the bytes, or reverse the user-perceived characters — and each of those gives a different answer once you include emoji, combining diacritics, or right-to-left scripts. This guide covers the four common reversal definitions, why a naive character-by-character loop breaks modern Unicode text, and how to handle edge cases like flag emoji, Thai vowel marks, and Hebrew. You’ll also learn the palindrome-checking application and the gotchas specific to that.

Advertisement

Four definitions of “reverse”

Before writing code, pick a definition:

  • Character reversal — last char first, first char last
  • Word reversal — word order flips, word spelling stays
  • Byte reversal — rarely what you want, shown for completeness
  • Grapheme reversal — what users actually expect with emoji/diacritics

Character reversal: the naive version

The textbook approach works for pure ASCII:

"hello" -> "olleh"

// JS
[...str].reverse().join("")

// Python
str[::-1]

This works until the input has anything beyond the basic multilingual plane.

Why naive reversal breaks on emoji

Many emoji are stored as surrogate pairs in UTF-16 — two code units forming one user-perceived character. JavaScript’sstr.split(””) splits at the code-unit level, which splits surrogate pairs apart.

// Broken:
"a\uD83D\uDE00b".split("").reverse().join("")
// produces garbled surrogate order

// Correct:
[..."a\uD83D\uDE00b"].reverse().join("")
// spread uses iterator, which respects code points

Combining characters: the deeper problem

Even code-point iteration isn’t enough. An “é” can be one code point (U+00E9) or two (U+0065 + U+0301 combining acute). If you reverse the two-code-point form, the accent ends up on the wrong letter.

"cafe\u0301" reversed naively -> "\u0301efac"
// the combining mark now attaches to whatever
// was before it, not to "e"

The fix: split by grapheme clusters, not code points. UseIntl.Segmenter in modern JS or the regexpackage in Python.

Grapheme-safe reversal

// JS
const seg = new Intl.Segmenter("en", { granularity: "grapheme" });
const graphemes = [...seg.segment(str)].map(s => s.segment);
const reversed = graphemes.reverse().join("");

// Python
import regex
graphemes = regex.findall(r"\X", str)
reversed = "".join(graphemes[::-1])

This handles flag emoji (regional-indicator pairs), skin-tone modifiers, ZWJ sequences (family emoji), and combining marks correctly.

Word reversal

Word-level reversal flips the order of tokens without reversing each token. “The quick brown fox” becomes “fox brown quick The.”

str.split(/\s+/).reverse().join(" ")

Watch the whitespace handling — double spaces, tabs, newlines. Decide whether you want to preserve exact whitespace or normalize.

Right-to-left scripts

Arabic, Hebrew, and Persian already display right-to-left. Reversing them at the character level produces text that displays in left-to-right order, which looks “forward” to an LTR reader but is actually a scrambled string. Reversing is almost never what you want for RTL content. If you’re rendering a mixed-script sentence, the Unicode bidirectional algorithm handles visual order separately from storage order — don’t fight it.

Byte reversal

Reversing raw UTF-8 bytes produces invalid UTF-8 in almost every case and should be avoided unless you’re doing low-level work on ASCII-only data. The multi-byte continuation bytes will end up in positions where lead bytes belong.

Palindrome checking

The classic application. Canonical workflow:

  • Lowercase
  • Strip punctuation and whitespace
  • Normalize Unicode (NFC)
  • Compare to its grapheme-reversed self
function isPalindrome(s) {
  const norm = s.toLowerCase()
    .normalize("NFC")
    .replace(/[^\p{L}\p{N}]/gu, "");
  const seg = new Intl.Segmenter("en", { granularity: "grapheme" });
  const graphs = [...seg.segment(norm)].map(x => x.segment);
  return graphs.join("") === graphs.slice().reverse().join("");
}

Line-by-line reversal

A different kind of “reverse”: keep each line intact but put the last line first. Useful for chronological log files.

str.split("\n").reverse().join("\n")

Performance notes

For strings under ~10 KB, grapheme segmentation is fast enough that you shouldn’t worry. For multi-MB inputs, iterator-based approaches beat splitting the whole thing into an array. Streaming grapheme segmentation requires buffer handling because a grapheme can span chunks.

Common mistakes

Splitting on empty string and reversing — breaks emoji and combining marks. Reversing then lowercasing palindromes (do it in the other order — case changes in some scripts change the code-point count). Forgetting to normalize before comparing, so “café” and “cafe + combining accent” compare unequal. Expecting meaningful output from reversing RTL text.

Run the numbers

Text reverserText repeaterCase converter

Advertisement

Found this useful?Email