How-To & Life · Guide · Text & Writing Utilities
How to Sort Lines of Text
Alphabetical vs numeric, case-sensitive vs insensitive, natural sort, stable sort, reverse sort, and locale considerations.
Sorting a list of lines is a one-click operation until the moment you realize the sort put “Item 10” before “Item 2,” mixed your capitalization wrong, or flipped Spanish ñ into the wrong alphabetical slot. “Sort these lines” hides a dozen decisions: alphabetical or numeric, case-sensitive or not, natural or lexicographic, stable or not, and which locale’s collation rules. This guide walks through each choice, shows where the common mistakes happen, and covers the specialized modes — natural sort for mixed alphanumeric, reverse sort for most-recent-first, and locale-aware sort for non-English text — that handle real-world data correctly.
Advertisement
Lexicographic vs alphabetical vs numeric
Most “alphabetical” sort is technicallylexicographic — character-by-character comparison by code point. That works for pure letters but produces surprises:
lexicographic: numeric: Apple Apple Banana Banana Item 10 Item 2 Item 2 Item 10 Zebra Zebra
“Item 10” sorts before “Item 2” lexicographically because ‘1’ < ‘2’ in ASCII. Natural sort fixes this.
Case sensitivity
In ASCII, uppercase letters all have lower code points than lowercase. A case-sensitive sort produces:
Apple Banana apple banana
Case-insensitive sort groups them together:
apple Apple banana Banana
Case-insensitive is almost always what humans want for human-readable lists. Case-sensitive is right when you’re sorting identifiers, code, or anything where case carries meaning.
Natural sort
Natural sort recognizes runs of digits and compares them numerically. It’s what you want for filenames, version numbers, chapter lists, and anything with embedded numbers.
v1.2 v1.10 v1.9 natural sort: v1.2 v1.9 v1.10
In JavaScript, localeCompare with{ numeric: true } gives you natural sort for free.
Stable sort
A stable sort preserves the relative order of lines that compare equal. Non-stable sort may reshuffle them. This matters when you sort on one key and want a previous ordering preserved as the tiebreaker.
input (sorted by age): Alice 30 Bob 30 Carol 25 sort by name (stable): Alice 30 Bob 30 Carol 25 sort by name (unstable): Alice/Bob order may flip
Most modern language sorts are stable: JavaScript’sArray.sort (since ES2019), Python’s sorted, Java’s Collections.sort. Unix sort -sis stable.
Reverse sort
Sort ascending, then reverse. Or pass a descending flag. Most languages have a one-liner:
// JS arr.sort((a, b) => b.localeCompare(a)); // Python sorted(lines, reverse=True) // Unix sort -r
Locale-aware collation
Code-point order is not alphabetical order in many languages. A few examples:
- Spanish ñ sorts between n and o
- German ß traditionally sorts as “ss”
- Swedish å ä ö sort after z, not as variants of a/o
- Czech ch is a single collation unit
- French ignores accents at the primary level, uses them as tiebreakers
Use Intl.Collator with the right locale:
const coll = new Intl.Collator("sv");
arr.sort(coll.compare);Diacritic handling
Base-letter vs diacritic-aware comparison is set via sensitivity:
base— a = á = A = Áaccent— a = A, a ≠ ácase— a = á, a ≠ Avariant— distinguishes everything (default)
Sorting with a header line
A lot of real data has a header or title line that shouldn’t be sorted. Either strip it first, prefix with a character that sorts first (! or #), or sort the slice excluding the first line and re-prepend.
Sorting by column
For tab- or comma-separated data, you usually want to sort by one column, not the whole line. Unix: sort -k2 sorts by the second field. In a spreadsheet, sort the range and pick the column as key. Doing a naive line sort on CSV data gives alphabetical by first column only.
Large files and memory
Sorting a 10 GB file in memory won’t work. Use an external merge-sort or the Unix sort command, which spills to disk automatically. Set the temp directory and memory limit explicitly when needed.
sort -T /tmp -S 2G input.txt -o output.txt
Common mistakes
Using lexicographic sort for filenames and getting “file10” before “file2.” Running a case-sensitive sort on mixed-case human names. Sorting German or Spanish lists with default ASCII collation. Reading the sort output as “stable” when you ran a non-stable algorithm. Forgetting that your header line is now somewhere in the middle of the list.
Run the numbers
Advertisement